diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-27 12:03:20 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-27 12:03:20 -0700 |
commit | 468fc7ed5537615efe671d94248446ac24679773 (patch) | |
tree | 27bc9de792e863d6ec1630927b77ac9e7dabb38a /Documentation/networking/rds.txt | |
parent | 08fd8c17686c6b09fa410a26d516548dd80ff147 (diff) | |
parent | 36232012344b8db67052432742deaf17f82e70e6 (diff) | |
download | linux-stable-468fc7ed5537615efe671d94248446ac24679773.tar.gz linux-stable-468fc7ed5537615efe671d94248446ac24679773.tar.bz2 linux-stable-468fc7ed5537615efe671d94248446ac24679773.zip |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Unified UDP encapsulation offload methods for drivers, from
Alexander Duyck.
2) Make DSA binding more sane, from Andrew Lunn.
3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.
4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.
5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
packets as soon as the device sees them, with the option to mirror
the packet on TX via the same interface. From Brenden Blanco and
others.
6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.
7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.
8) Simplify netlink conntrack entry layout, from Florian Westphal.
9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
Schimmel, Yotam Gigi, and Jiri Pirko.
10) Add SKB array infrastructure and convert tun and macvtap over to it.
From Michael S Tsirkin and Jason Wang.
11) Support qdisc packet injection in pktgen, from John Fastabend.
12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.
13) Add NV congestion control support to TCP, from Lawrence Brakmo.
14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.
15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.
16) Support MPLS over IPV4, from Simon Horman.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
xgene: Fix build warning with ACPI disabled.
be2net: perform temperature query in adapter regardless of its interface state
l2tp: Correctly return -EBADF from pppol2tp_getname.
net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
net: ipmr/ip6mr: update lastuse on entry change
macsec: ensure rx_sa is set when validation is disabled
tipc: dump monitor attributes
tipc: add a function to get the bearer name
tipc: get monitor threshold for the cluster
tipc: make cluster size threshold for monitoring configurable
tipc: introduce constants for tipc address validation
net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
MAINTAINERS: xgene: Add driver and documentation path
Documentation: dtb: xgene: Add MDIO node
dtb: xgene: Add MDIO node
drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
drivers: net: xgene: Use exported functions
drivers: net: xgene: Enable MDIO driver
drivers: net: xgene: Add backward compatibility
drivers: net: phy: xgene: Add MDIO driver
...
Diffstat (limited to 'Documentation/networking/rds.txt')
-rw-r--r-- | Documentation/networking/rds.txt | 72 |
1 files changed, 71 insertions, 1 deletions
diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt index 9d219d856d46..0235ae69af2a 100644 --- a/Documentation/networking/rds.txt +++ b/Documentation/networking/rds.txt @@ -85,7 +85,8 @@ Socket Interface bind(fd, &sockaddr_in, ...) This binds the socket to a local IP address and port, and a - transport. + transport, if one has not already been selected via the + SO_RDS_TRANSPORT socket option sendmsg(fd, ...) Sends a message to the indicated recipient. The kernel will @@ -146,6 +147,20 @@ Socket Interface operation. In this case, it would use RDS_CANCEL_SENT_TO to nuke any pending messages. + setsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, (int *)&transport ..) + getsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, (int *)&transport ..) + Set or read an integer defining the underlying + encapsulating transport to be used for RDS packets on the + socket. When setting the option, integer argument may be + one of RDS_TRANS_TCP or RDS_TRANS_IB. When retrieving the + value, RDS_TRANS_NONE will be returned on an unbound socket. + This socket option may only be set exactly once on the socket, + prior to binding it via the bind(2) system call. Attempts to + set SO_RDS_TRANSPORT on a socket for which the transport has + been previously attached explicitly (by SO_RDS_TRANSPORT) or + implicitly (via bind(2)) will return an error of EOPNOTSUPP. + An attempt to set SO_RDS_TRANSPPORT to RDS_TRANS_NONE will + always return EINVAL. RDMA for RDS ============ @@ -350,4 +365,59 @@ The recv path handle CMSGs return to application +Multipath RDS (mprds) +===================== + Mprds is multipathed-RDS, primarily intended for RDS-over-TCP + (though the concept can be extended to other transports). The classical + implementation of RDS-over-TCP is implemented by demultiplexing multiple + PF_RDS sockets between any 2 endpoints (where endpoint == [IP address, + port]) over a single TCP socket between the 2 IP addresses involved. This + has the limitation that it ends up funneling multiple RDS flows over a + single TCP flow, thus it is + (a) upper-bounded to the single-flow bandwidth, + (b) suffers from head-of-line blocking for all the RDS sockets. + + Better throughput (for a fixed small packet size, MTU) can be achieved + by having multiple TCP/IP flows per rds/tcp connection, i.e., multipathed + RDS (mprds). Each such TCP/IP flow constitutes a path for the rds/tcp + connection. RDS sockets will be attached to a path based on some hash + (e.g., of local address and RDS port number) and packets for that RDS + socket will be sent over the attached path using TCP to segment/reassemble + RDS datagrams on that path. + + Multipathed RDS is implemented by splitting the struct rds_connection into + a common (to all paths) part, and a per-path struct rds_conn_path. All + I/O workqs and reconnect threads are driven from the rds_conn_path. + Transports such as TCP that are multipath capable may then set up a + TPC socket per rds_conn_path, and this is managed by the transport via + the transport privatee cp_transport_data pointer. + + Transports announce themselves as multipath capable by setting the + t_mp_capable bit during registration with the rds core module. When the + transport is multipath-capable, rds_sendmsg() hashes outgoing traffic + across multiple paths. The outgoing hash is computed based on the + local address and port that the PF_RDS socket is bound to. + + Additionally, even if the transport is MP capable, we may be + peering with some node that does not support mprds, or supports + a different number of paths. As a result, the peering nodes need + to agree on the number of paths to be used for the connection. + This is done by sending out a control packet exchange before the + first data packet. The control packet exchange must have completed + prior to outgoing hash completion in rds_sendmsg() when the transport + is mutlipath capable. + + The control packet is an RDS ping packet (i.e., packet to rds dest + port 0) with the ping packet having a rds extension header option of + type RDS_EXTHDR_NPATHS, length 2 bytes, and the value is the + number of paths supported by the sender. The "probe" ping packet will + get sent from some reserved port, RDS_FLAG_PROBE_PORT (in <linux/rds.h>) + The receiver of a ping from RDS_FLAG_PROBE_PORT will thus immediately + be able to compute the min(sender_paths, rcvr_paths). The pong + sent in response to a probe-ping should contain the rcvr's npaths + when the rcvr is mprds-capable. + + If the rcvr is not mprds-capable, the exthdr in the ping will be + ignored. In this case the pong will not have any exthdrs, so the sender + of the probe-ping can default to single-path mprds. |