summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bpf, maps: flush own entries on perf map releaseDaniel Borkmann2016-06-153-38/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The behavior of perf event arrays are quite different from all others as they are tightly coupled to perf event fds, f.e. shown recently by commit e03e7ee34fdd ("perf/bpf: Convert perf_event_array to use struct file") to make refcounting on perf event more robust. A remaining issue that the current code still has is that since additions to the perf event array take a reference on the struct file via perf_event_get() and are only released via fput() (that cleans up the perf event eventually via perf_event_release_kernel()) when the element is either manually removed from the map from user space or automatically when the last reference on the perf event map is dropped. However, this leads us to dangling struct file's when the map gets pinned after the application owning the perf event descriptor exits, and since the struct file reference will in such case only be manually dropped or via pinned file removal, it leads to the perf event living longer than necessary, consuming needlessly resources for that time. Relations between perf event fds and bpf perf event map fds can be rather complex. F.e. maps can act as demuxers among different perf event fds that can possibly be owned by different threads and based on the index selection from the program, events get dispatched to one of the per-cpu fd endpoints. One perf event fd (or, rather a per-cpu set of them) can also live in multiple perf event maps at the same time, listening for events. Also, another requirement is that perf event fds can get closed from application side after they have been attached to the perf event map, so that on exit perf event map will take care of dropping their references eventually. Likewise, when such maps are pinned, the intended behavior is that a user application does bpf_obj_get(), puts its fds in there and on exit when fd is released, they are dropped from the map again, so the map acts rather as connector endpoint. This also makes perf event maps inherently different from program arrays as described in more detail in commit c9da161c6517 ("bpf: fix clearing on persistent program array maps"). To tackle this, map entries are marked by the map struct file that added the element to the map. And when the last reference to that map struct file is released from user space, then the tracked entries are purged from the map. This is okay, because new map struct files instances resp. frontends to the anon inode are provided via bpf_map_new_fd() that is called when we invoke bpf_obj_get_user() for retrieving a pinned map, but also when an initial instance is created via map_create(). The rest is resolved by the vfs layer automatically for us by keeping reference count on the map's struct file. Any concurrent updates on the map slot are fine as well, it just means that perf_event_fd_array_release() needs to delete less of its own entires. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf, maps: extend map_fd_get_ptr argumentsDaniel Borkmann2016-06-153-10/+24
| | | | | | | | | | | | | This patch extends map_fd_get_ptr() callback that is used by fd array maps, so that struct file pointer from the related map can be passed in. It's safe to remove map_update_elem() callback for the two maps since this is only allowed from syscall side, but not from eBPF programs for these two map types. Like in per-cpu map case, bpf_fd_array_map_update_elem() needs to be called directly here due to the extra argument. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf, maps: add release callbackDaniel Borkmann2016-06-152-2/+8
| | | | | | | | | | Add a release callback for maps that is invoked when the last reference to its struct file is gone and the struct file about to be released by vfs. The handler will be used by fd array maps. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'sfc-rx-vlan-filtering'David S. Miller2016-06-158-184/+2033
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Edward Cree says: ==================== sfc: RX VLAN filtering Adds support for VLAN-qualified receive filters on EF10 hardware. This is needed when running as a guest if the hypervisor has enabled vfs-vlan-restrict, in which case the firmware rejects filters not qualified with VLAN 0. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Fix VLAN filtering feature if vPort has VLAN_RESTRICT flagAndrew Rybchenko2016-06-153-5/+74
| | | | | | | | | | | | | | | | | | If vPort has VLAN_RESTRICT flag, VLAN tagged traffic will not be delivered without corresponding Rx filters which may be proxied to and moderated by hypervisor. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Update MCDI protocol definitionsEdward Cree2016-06-151-48/+1279
| | | | | | | | | | Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Disable VLAN filtering by default if not strictly requiredAndrew Rybchenko2016-06-151-1/+9
| | | | | | | | | | | | | | | | | | | | | | If should be done after net_dev->hw_features initialization, to keep the feature there to be able to enable it later using ethtool. VLAN filtering is enforced and fixed if vPort requires usage of VLAN filters to receive tagged traffic. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: VLAN filters must only be created if the firmware supports this.Martin Habets2016-06-152-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If it is not supported we simply disable the feature. For the feature to work we need firmware filter support for OUTER_VID + LOC_MAC and for OUTER_VID + LOC_MAC_IG. The low-latency firmware can match on OUTER_VID + LOC_MAC but not on OUTER_VID + LOC_MAC_IG. For the capture packet firmware it is the other way around. Only the full-feature variant can match on both combinations. Incorporates a fix by Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru> in the net_dev->[hw_]features handling. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Fix dup unknown multicast/unicast filters after datapath resetAndrew Rybchenko2016-06-151-11/+69
| | | | | | | | | | | | | | | | | | | | Filter match flags are not unique criteria to be mapped to priority because of both unknown unicast and unknown multicast are mapped to LOC_MAC_IG. So, local MAC is required to map filter to priority. MCDI filter flags is unique criteria to find filter priority. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Refactor checks for invalid filter IDEdward Cree2016-06-151-26/+13
| | | | | | | | | | | | | | | | | | | | | | Nearly every time we call efx_ef10_filter_remove_unsafe, we first check for EFX_EF10_FILTER_ID_INVALID, in which case we do nothing. So move that check into the function, simplifying all the call sites. Also, change the return type to void, since none of the callers check it. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Take mac_lock before calling efx_ef10_filter_table_probeMartin Habets2016-06-153-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to enslave an SFC interface to a bond the following BUG_ON was hit: kernel BUG [in ef10.c]! CPU: 0 PID: 4383 Comm: ifenslave Tainted: G ... Call Trace: efx_ef10_filter_add_vlan+0x121/0x180 [sfc] efx_ef10_filter_table_probe+0x2a2/0x4f0 [sfc] efx_ef10_set_mac_address+0x370/0x6d0 [sfc] efx_set_mac_address+0x7d/0x120 [sfc] dev_set_mac_address+0x43/0xa0 bond_enslave+0x337/0xea0 [bonding] This comes from function efx_ef10_filter_vlan_sync_rx_mode. To solve the bug we ensure the mac_lock is taken before calling efx_ef10_filter_add_vlan. But to avoid a priority inversion mac_lock must be taken before filter_sem. To satisfy these requirements we end up taking mac_lock in efx_ef10_vport_set_mac_address, efx_ef10_set_mac_address, efx_ef10_sriov_set_vf_vlan and efx_probe_filters. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Implement ndo_vlan_rx_{add, kill}_vid() callbacksAndrew Rybchenko2016-06-153-2/+124
| | | | | | | | | | | | | | Supports HW VLAN filtering, en/disabled using ethtool. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Implement list of VLANs added over interfaceAndrew Rybchenko2016-06-152-35/+295
| | | | | | | | | | | | | | | | Right now it contains dummy VLAN entry with unspecified VID only. The entry is used for the case when HW VLAN filtering is not used. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Make EF10 filter management helper functions VLAN-awareAndrew Rybchenko2016-06-151-29/+41
| | | | | | | | | | | | | | It is a step to support VLAN filtering in HW. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Store unicast and multicast promisc flag with address cacheAndrew Rybchenko2016-06-151-15/+15
| | | | | | | | | | | | | | | | | | These flags are built when address cache is updated. The information will be required when VLAN filtering is added and address cache is used without re-sync. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Move filter IDs to per-VLAN data structureAndrew Rybchenko2016-06-151-37/+51
| | | | | | | | | | | | | | | | | | It is a step to support VLAN filtering in HW. Until then, there is only one struct efx_ef10_filter_vlan per struct efx_ef10_filter_table, with no VLAN information yet. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Forget filter ID when the filter is marked oldAndrew Rybchenko2016-06-151-20/+33
| | | | | | | | | | | | | | | | | | | | It is required to remove setting of filter IDs to invalid from multicast and unicast addresses caching functions. Add initialization to invalid when filter table is created. Add paranoid checks to track consistency. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Assert filter_sem write locked when requiredEdward Cree2016-06-152-1/+24
| | | | | | | | | | | | | | Based on a patch by Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Add efx_nic member with fixed netdev featuresAndrew Rybchenko2016-06-152-4/+17
| | | | | | | | | | | | | | It allows to change set of fixed features on datapath reset. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Move last mc_promisc flag to EF10 filter table stateAndrew Rybchenko2016-06-152-4/+5
| | | | | | | | | | | | | | | | | | It is used for EF10 only and logically belongs to EF10 filter table state. It is OK that it is reset to false on filter table recreation since all filters are removed on destruction. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sfc: Define macro with EF10 offload featureAndrew Rybchenko2016-06-151-4/+8
|/ | | | | | | It is useful to simplify features addition. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'rxrpc-rewrite-20160615' of ↵David S. Miller2016-06-1515-647/+842
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Rework endpoint record handling Here's the next part of the AF_RXRPC rewrite. In this set I rework endpoint record handling. There are two types of endpoint record, local and peer. The local endpoint record is used as an anchor for the transport socket that AF_RXRPC uses (at the moment a UDP socket). Local endpoints can be shared between AF_RXRPC sockets under certain restricted circumstances. The peer endpoint is a record of the remote end. It is (or will be) used to keep track MTU and RTT values and, with these changes, is used to find the call(s) to abort when a network error occurs. The following significant changes are made: (1) The local endpoint event handling code is split out into its own file. (2) The local endpoint list bottom half-excluding spinlock is removed as things are arranged such that sk_user_data will not change whilst the transport socket callbacks are in progress. (3) Local endpoints can now only be shared if they have the same transport address (as before) and have a local service ID of 0 (ie. they're not listening for incoming calls). This prevents callbacks from a server to one process being picked up by another process. (4) Local endpoint destruction is now accomplished by the same work item as processes events, meaning that the destructor doesn't need to wait for the event processor. (5) Peer endpoints are now held in a hash table rather than a flat list. (6) Peer endpoints are now destroyed by RCU rather than by work item. (7) Peer endpoints are now differentiated by local endpoint and remote transport port in addition to remote transport address and transport type and family. This means that a firewall that excludes access between a particular local port and remote port won't cause calls to be aborted that use a different port pair. (8) Error report handling now no longer assumes that the source is always an IPv4 ICMP message from a UDP port and has assumptions that an ICMP message comes from an IPv4 socket removed. At some point IPv6 support will be added. (9) Peer endpoints rather than local endpoints are now the anchor point for distributing network error reports. (10) Both types of endpoint records are now disposed of as soon as all references to them are gone. There is less hanging around and once their usage counts hit zero, records can no longer be resurrected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * rxrpc: Rework local endpoint managementDavid Howells2016-06-157-230/+276
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework the local RxRPC endpoint management. Local endpoint objects are maintained in a flat list as before. This should be okay as there shouldn't be more than one per open AF_RXRPC socket (there can be fewer as local endpoints can be shared if their local service ID is 0 and they share the same local transport parameters). Changes: (1) Local endpoints may now only be shared if they have local service ID 0 (ie. they're not being used for listening). This prevents a scenario where process A is listening of the Cache Manager port and process B contacts a fileserver - which may then attempt to send CM requests back to B. But if A and B are sharing a local endpoint, A will get the CM requests meant for B. (2) We use a mutex to handle lookups and don't provide RCU-only lookups since we only expect to access the list when opening a socket or destroying an endpoint. The local endpoint object is pointed to by the transport socket's sk_user_data for the life of the transport socket - allowing us to refer to it directly from the sk_data_ready and sk_error_report callbacks. (3) atomic_inc_not_zero() now exists and can be used to only share a local endpoint if the last reference hasn't yet gone. (4) We can remove rxrpc_local_lock - a spinlock that had to be taken with BH processing disabled given that we assume sk_user_data won't change under us. (5) The transport socket is shut down before we clear the sk_user_data pointer so that we can be sure that the transport socket's callbacks won't be invoked once the RCU destruction is scheduled. (6) Local endpoints have a work item that handles both destruction and event processing. The means that destruction doesn't then need to wait for event processing. The event queues can then be cleared after the transport socket is shut down. (7) Local endpoints are no longer available for resurrection beyond the life of the sockets that had them open. As soon as their last ref goes, they are scheduled for destruction and may not have their usage count moved from 0. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Separate local endpoint event handling out into its own fileDavid Howells2016-06-154-102/+129
| | | | | | | | | | | | | | | | Separate local endpoint event handling out into its own file preparatory to overhauling the object management aspect (which remains in the original file). Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Use the peer record to distribute network errorsDavid Howells2016-06-157-94/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the peer record to distribute network errors rather than the transport object (which I want to get rid of). An error from a particular peer terminates all calls on that peer. For future consideration: (1) For ICMP-induced errors it might be worth trying to extract the RxRPC header from the offending packet, if one is returned attached to the ICMP packet, to better direct the error. This may be overkill, though, since an ICMP packet would be expected to be relating to the destination port, machine or network. RxRPC ABORT and BUSY packets give notice at RxRPC level. (2) To also abort connection-level communications (such as CHALLENGE packets) where indicted by an error - but that requires some revamping of the connection event handling first. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Do a little bit of tidying in the ICMP processingDavid Howells2016-06-151-4/+2
| | | | | | | | | | | | | | Do a little bit of tidying in the ICMP processing code. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Don't assume anything about the address in an ICMP packetDavid Howells2016-06-151-8/+0
| | | | | | | | | | | | | | | | Don't assume anything about the address in an ICMP packet in rxrpc_error_report() as the address may not be IPv4 in future, especially since we're just printing these details. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Break MTU determination from ICMP into its own functionDavid Howells2016-06-151-39/+54
| | | | | | | | | | | | | | Break MTU determination from ICMP out into its own function to reduce the complexity of the error report handler. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Rename rxrpc_UDP_error_report() to rxrpc_error_report()David Howells2016-06-153-4/+4
| | | | | | | | | | | | | | Rename rxrpc_UDP_error_report() to rxrpc_error_report() as it might get called for something other than UDP. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Rework peer object handling to use hash table and RCUDavid Howells2016-06-159-203/+335
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework peer object handling to use a hash table instead of a flat list and to use RCU. Peer objects are no longer destroyed by passing them to a workqueue to process, but rather are just passed to the RCU garbage collector as kfree'able objects. The hash function uses the local endpoint plus all the components of the remote address, except for the RxRPC service ID. Peers thus represent a UDP port on the remote machine as contacted by a UDP port on this machine. The RCU read lock is used to handle non-creating lookups so that they can be called from bottom half context in the sk_error_report handler without having to lock the hash table against modification. rxrpc_lookup_peer_rcu() *does* take a reference on the peer object as in the future, this will be passed to a work item for error distribution in the error_report path and this function will cease being used in the data_ready path. Creating lookups are done under spinlock rather than mutex as they might be set up due to an external stimulus if the local endpoint is a server. Captured network error messages (ICMP) are handled with respect to this struct and MTU size and RTT are cached here. Signed-off-by: David Howells <dhowells@redhat.com>
* | Merge branch 'liquidio-next'David S. Miller2016-06-1514-664/+1218
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Raghu Vatsavayi says: ==================== liquidio: Updates and Bug fixes Following are updates for liquidio bug fixes and driver support for new firmware interface. These updates are divided into smaller logical patches as mentioned by you. These set of nine patches should be applied in the following order as some of them depend on earlier patches in the list. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio: Introduce new octeon2/3 headerRaghu Vatsavayi2016-06-157-97/+334
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added support for new instruction header for octeon2/octeon3(ih) and corresponding changes. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio: Replace ifidx for FW commandsRaghu Vatsavayi2016-06-1512-174/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch decoupled the firmware side ifidx and host side interface number. It also has some minor name change for linkinfo sturct field. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio: New driver FW command structureRaghu Vatsavayi2016-06-153-130/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is for new driver/firmware control command structure (octnic_packet_params and octnic_cmd_setup ) and resultant code changes. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio: Consider PTP for packet size calculationsRaghu Vatsavayi2016-06-152-15/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to refactor packet size calculations to support PTP enabled for 66xx and 68xx cards and also other cards that do not support PTP. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio: RX desc alloc changesRaghu Vatsavayi2016-06-154-83/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to add page based buffers for receive side descriptors of the driver and separate free routines for rx and tx buffers. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio:RX queue alloc changesRaghu Vatsavayi2016-06-153-33/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to allocate rx queue's memory based on numa node and also use page based buffers for rx traffic improvements. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio:Scatter gather list per IQRaghu Vatsavayi2016-06-152-73/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to allocate and manage scatter gather lists per input queue(iq's) and remove queue's interdependence. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio: Host queue mapping changesRaghu Vatsavayi2016-06-156-65/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to allocate the input queues based on Numa node in tx path and queue mapping changes based on the mapping info provided by firmware. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | liquidio: Avoid double free during soft commandRaghu Vatsavayi2016-06-153-5/+5
|/ / | | | | | | | | | | | | | | | | | | | | This patch is to resolve the double free issue by checking proper return values from soft command. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ila: Fix checksum neutral mappingTom Herbert2016-06-151-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The algorithm for checksum neutral mapping is incorrect. This problem was being hidden since we were previously always performing checksum offload on the translated addresses and only with IPv6 HW csum. Enabling an ILA router shows the issue. Corrected algorithm: old_loc is the original locator in the packet, new_loc is the value to overwrite with and is found in the lookup table. old_flag is the old flag value (zero of CSUM_NEUTRAL_FLAG) and new_flag is then (old_flag ^ CSUM_NEUTRAL_FLAG) & CSUM_NEUTRAL_FLAG. Need SUM(new_id + new_flag + diff) == SUM(old_id + old_flag) for checksum neutral translation. Solving for diff gives: diff = (old_id - new_id) + (old_flag - new_flag) compute_csum_diff8(new_id, old_id) gives old_id - new_id If old_flag is set old_flag - new_flag = old_flag = CSUM_NEUTRAL_FLAG Else old_flag - new_flag = -new_flag = ~CSUM_NEUTRAL_FLAG Tested: - Implemented a user space program that creates random addresses and random locators to overwrite. Compares the checksum over the address before and after translation (must always be equal) - Enabled ILA router and showed proper operation. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloadsPhilip Prindeville2016-06-154-10/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the presence of firewalls which improperly block ICMP Unreachable (including Fragmentation Required) messages, Path MTU Discovery is prevented from working. A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as is done for other payloads like AppleTalk--and doing transparent fragmentation and reassembly. Redux includes the enforcement of mutual exclusion between this feature and Path MTU Discovery as suggested by Alexander Duyck. Cc: Alexander Duyck <alexander.duyck@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: vrf: Switch dst dev to loopback on device deleteDavid Ahern2016-06-151-13/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempting to delete a VRF device with a socket bound to it can stall: unregister_netdevice: waiting for red to become free. Usage count = 1 The unregister is waiting for the dst to be released and with it references to the vrf device. Similar to dst_ifdown switch the dst dev to loopback on delete for all of the dst's for the vrf device and release the references to the vrf device. Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'shared' of ↵David S. Miller2016-06-151-12/+263
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Mellanox shared code between RDMA and net-next trees This is Mellanox mlx5_core shared code for both net-next and RDMA trees for 4.8 kernel cycle.
| * | {net,IB}/mlx5: mlx5_ifc updatesSaeed Mahameed2016-06-101-12/+263
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introducing mlx5_ifc updates for upcoming ConnectX-4 features. Needed bits and hardware structures for mlx5e netdev: - MLX5_CQ_PERIOD_NUM_MODES for adaptive moderation support - QoS rate limiting - SQ context rate limiting - Auto negotiation fields in PTYS register - Source SQN field in flow table entry match structure - DCBX parameters Needed bits and hardware structures for IB: - New XRQ opcodes, commands and capabilities layout - Extend q counters definition to support IB. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
* | | mdio: mux: avoid 'maybe-uninitialized' warningArnd Bergmann2016-06-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The latest changes to the MDIO code introduced a false-positive warning with gcc-6 (possibly others): drivers/net/phy/mdio-mux.c: In function 'mdio_mux_init': drivers/net/phy/mdio-mux.c:188:3: error: 'parent_bus_node' may be used uninitialized in this function [-Werror=maybe-uninitialized] It's easy to avoid the warning by making sure the parent_bus_node is initialized in both cases at the start of the function, since the later 'of_node_put()' call is also valid for a NULL pointer argument. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: f20e6657a875 ("mdio: mux: Enhanced MDIO mux framework for integrated multiplexers") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch '6lowpan-ndisc'David S. Miller2016-06-1515-248/+1004
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexander Aring says: ==================== 6lowpan: introduce 6lowpan-nd David can you please pick-up this patch-serie for your net-next tree? Thanks in advance. This patch series introduces the ndisc ops callback structure to add different handling for IPv6 neighbour discovery cache functionality. It implements at first the two following use-cases: - 6CO handling as userspace option (For all 6LoWPAN layers, BTLE/802.15.4) [0] - short address handling for 802.15.4 6LoWPAN only [1] Since my last patch series, I completely changed the whole ndisc_ops callback structure to not replace the whole ndisc functionality at recv/send level of NS/NA/RS/RA which I send in my previous patch-series "6lowpan: introduce basic 6lowpan-nd". I changed it now to add different handling in a very low-level way of ndisc functionality. The ndisc_ops don't must be registered to dev->ndisc_ops anymore, if they are not set, then no additional ipv6 ndisc handling will be done. This patch series now introduce a complete handling of short address for 802.15.4 6LoWPAN in case of send/recv of NA/NS/RS and RA. In case of RA (receive only) and PIO we also need a second prefix + short-address based address. This callback structure can be used later (I hope) for RFC 6775 [0]. This RFC defines some new option fields and messages for 6LoWPAN-ND. This patch series does not implement RFC 6775 (except we decide now to handle 6CO in userspace). Additional we can use the current ops for parse/fill ndisc options for kernel handled ndisc messages to add 6CIO, see [2]. I tested RA/NS/NA/RS messages with short address which seems to work, what I didn't test is the redirect messages since I don't know how to generate them. The short address for redirect messages are also some special case here, because the short address by a L3 target address lookuped by neighbour cache need to be added. btw: According to [3] sending redirect messages should be also disabled by default on 6lowpan interfaces, but can be activated afterwards. This is maybe something for the ipv6_devconf structure. There is a "accept_redirects" but no "disable_redirects". - Alex [0] https://tools.ietf.org/html/rfc6775 [1] https://tools.ietf.org/html/rfc4944#section-8 [2] https://tools.ietf.org/html/rfc7400 changes since v3: - add acked-by and reviewed-by tags - fix url references in cover-letter - add cover-letter that this patch series is okay to go through net-next tree changes since RFC: - add lowlevel functions __ndisc_opt_addr_space, __ndisc_opt_addr_data and __ndisc_fill_addr_option for corresponding functions which doesn't requires net_device argument. - move ndisc_ops e.g. ndisc_ops_fill_addr_option function call into the corresponding device argument function ndisc_fill_addr_option. (Introduced a special static inline function for redirect handling). - fix error handling in addrconf_prefix_rcv_add_addr. (Please see, introduce new API handling that second address registration (in case of 802.15.4 6LoWPAN) will still be notified if failed, because dev->addr was successful. - add ieee802154 sub-directory in short address entry for 6lowpan UAPI. - add lowpan_802154_is_valid_src_short_addr, because 802.15.4 6lowpan defines the first bit as multicast (don't know how this can be working at the end, because some hardware addresses will handle such addresses in L2 as unicast. See: https://www.iana.org/assignments/_6lowpan-parameters/_6lowpan-parameters.xhtml#_6lowpan-parameters-2 changes since v2: - Introduce ndisc_ops to have our own implementation for dealing with NS/NA which allows also to support RFC6775 (e.g. ARO). - add handling for handling 6CO as userspace option for RA messages in case of 6LoWPAN interfaces. - change lowpan_is_ll to check on linklayer type only. - added some reviewed-by's. - move short addr slaac to net/6lowpan instead ipv6 handling. - add handling for context based address compression in case for short address as link-layer address. - change strategy to use short address, a short address will always be used when it's available. - Handle override flag in NA messages to update short address information or not. ==================== Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | 6lowpan: add support for 802.15.4 short addr handlingAlexander Aring2016-06-152-85/+195
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds necessary handling for use the short address for 802.15.4 6lowpan. It contains support for IPHC address compression and new matching algorithmn to decide which link layer address will be used for 802.15.4 frame. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | 6lowpan: add support for getting short addressAlexander Aring2016-06-151-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of sending RA messages we need some way to get the short address from an 802.15.4 6LoWPAN interface. This patch will add a temporary debugfs entry for experimental userspace api. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | 6lowpan: introduce 6lowpan-ndAlexander Aring2016-06-155-8/+254
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduce different 6lowpan handling for receive and transmit NS/NA messages for the ipv6 neighbour discovery. The first use-case is for supporting 802.15.4 short addresses inside the option fields and handling for RFC6775 6CO option field as userspace option. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>