diff options
Diffstat (limited to 'Documentation/networking/kcm.txt')
-rw-r--r-- | Documentation/networking/kcm.txt | 285 |
1 files changed, 0 insertions, 285 deletions
diff --git a/Documentation/networking/kcm.txt b/Documentation/networking/kcm.txt deleted file mode 100644 index b773a5278ac4..000000000000 --- a/Documentation/networking/kcm.txt +++ /dev/null @@ -1,285 +0,0 @@ -Kernel Connection Multiplexor ------------------------------ - -Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based -interface over TCP for generic application protocols. With KCM an application -can efficiently send and receive application protocol messages over TCP using -datagram sockets. - -KCM implements an NxM multiplexor in the kernel as diagrammed below: - -+------------+ +------------+ +------------+ +------------+ -| KCM socket | | KCM socket | | KCM socket | | KCM socket | -+------------+ +------------+ +------------+ +------------+ - | | | | - +-----------+ | | +----------+ - | | | | - +----------------------------------+ - | Multiplexor | - +----------------------------------+ - | | | | | - +---------+ | | | ------------+ - | | | | | -+----------+ +----------+ +----------+ +----------+ +----------+ -| Psock | | Psock | | Psock | | Psock | | Psock | -+----------+ +----------+ +----------+ +----------+ +----------+ - | | | | | -+----------+ +----------+ +----------+ +----------+ +----------+ -| TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock | -+----------+ +----------+ +----------+ +----------+ +----------+ - -KCM sockets ------------ - -The KCM sockets provide the user interface to the multiplexor. All the KCM sockets -bound to a multiplexor are considered to have equivalent function, and I/O -operations in different sockets may be done in parallel without the need for -synchronization between threads in userspace. - -Multiplexor ------------ - -The multiplexor provides the message steering. In the transmit path, messages -written on a KCM socket are sent atomically on an appropriate TCP socket. -Similarly, in the receive path, messages are constructed on each TCP socket -(Psock) and complete messages are steered to a KCM socket. - -TCP sockets & Psocks --------------------- - -TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated -for each bound TCP socket, this structure holds the state for constructing -messages on receive as well as other connection specific information for KCM. - -Connected mode semantics ------------------------- - -Each multiplexor assumes that all attached TCP connections are to the same -destination and can use the different connections for load balancing when -transmitting. The normal send and recv calls (include sendmmsg and recvmmsg) -can be used to send and receive messages from the KCM socket. - -Socket types ------------- - -KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types. - -Message delineation -------------------- - -Messages are sent over a TCP stream with some application protocol message -format that typically includes a header which frames the messages. The length -of a received message can be deduced from the application protocol header -(often just a simple length field). - -A TCP stream must be parsed to determine message boundaries. Berkeley Packet -Filter (BPF) is used for this. When attaching a TCP socket to a multiplexor a -BPF program must be specified. The program is called at the start of receiving -a new message and is given an skbuff that contains the bytes received so far. -It parses the message header and returns the length of the message. Given this -information, KCM will construct the message of the stated length and deliver it -to a KCM socket. - -TCP socket management ---------------------- - -When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and -write space available (POLLOUT) events are handled by the multiplexor. If there -is a state change (disconnection) or other error on a TCP socket, an error is -posted on the TCP socket so that a POLLERR event happens and KCM discontinues -using the socket. When the application gets the error notification for a -TCP socket, it should unattach the socket from KCM and then handle the error -condition (the typical response is to close the socket and create a new -connection if necessary). - -KCM limits the maximum receive message size to be the size of the receive -socket buffer on the attached TCP socket (the socket buffer size can be set by -SO_RCVBUF). If the length of a new message reported by the BPF program is -greater than this limit a corresponding error (EMSGSIZE) is posted on the TCP -socket. The BPF program may also enforce a maximum messages size and report an -error when it is exceeded. - -A timeout may be set for assembling messages on a receive socket. The timeout -value is taken from the receive timeout of the attached TCP socket (this is set -by SO_RCVTIMEO). If the timer expires before assembly is complete an error -(ETIMEDOUT) is posted on the socket. - -User interface -============== - -Creating a multiplexor ----------------------- - -A new multiplexor and initial KCM socket is created by a socket call: - - socket(AF_KCM, type, protocol) - - - type is either SOCK_DGRAM or SOCK_SEQPACKET - - protocol is KCMPROTO_CONNECTED - -Cloning KCM sockets -------------------- - -After the first KCM socket is created using the socket call as described -above, additional sockets for the multiplexor can be created by cloning -a KCM socket. This is accomplished by an ioctl on a KCM socket: - - /* From linux/kcm.h */ - struct kcm_clone { - int fd; - }; - - struct kcm_clone info; - - memset(&info, 0, sizeof(info)); - - err = ioctl(kcmfd, SIOCKCMCLONE, &info); - - if (!err) - newkcmfd = info.fd; - -Attach transport sockets ------------------------- - -Attaching of transport sockets to a multiplexor is performed by calling an -ioctl on a KCM socket for the multiplexor. e.g.: - - /* From linux/kcm.h */ - struct kcm_attach { - int fd; - int bpf_fd; - }; - - struct kcm_attach info; - - memset(&info, 0, sizeof(info)); - - info.fd = tcpfd; - info.bpf_fd = bpf_prog_fd; - - ioctl(kcmfd, SIOCKCMATTACH, &info); - -The kcm_attach structure contains: - fd: file descriptor for TCP socket being attached - bpf_prog_fd: file descriptor for compiled BPF program downloaded - -Unattach transport sockets --------------------------- - -Unattaching a transport socket from a multiplexor is straightforward. An -"unattach" ioctl is done with the kcm_unattach structure as the argument: - - /* From linux/kcm.h */ - struct kcm_unattach { - int fd; - }; - - struct kcm_unattach info; - - memset(&info, 0, sizeof(info)); - - info.fd = cfd; - - ioctl(fd, SIOCKCMUNATTACH, &info); - -Disabling receive on KCM socket -------------------------------- - -A setsockopt is used to disable or enable receiving on a KCM socket. -When receive is disabled, any pending messages in the socket's -receive buffer are moved to other sockets. This feature is useful -if an application thread knows that it will be doing a lot of -work on a request and won't be able to service new messages for a -while. Example use: - - int val = 1; - - setsockopt(kcmfd, SOL_KCM, KCM_RECV_DISABLE, &val, sizeof(val)) - -BFP programs for message delineation ------------------------------------- - -BPF programs can be compiled using the BPF LLVM backend. For example, -the BPF program for parsing Thrift is: - - #include "bpf.h" /* for __sk_buff */ - #include "bpf_helpers.h" /* for load_word intrinsic */ - - SEC("socket_kcm") - int bpf_prog1(struct __sk_buff *skb) - { - return load_word(skb, 0) + 4; - } - - char _license[] SEC("license") = "GPL"; - -Use in applications -=================== - -KCM accelerates application layer protocols. Specifically, it allows -applications to use a message based interface for sending and receiving -messages. The kernel provides necessary assurances that messages are sent -and received atomically. This relieves much of the burden applications have -in mapping a message based protocol onto the TCP stream. KCM also make -application layer messages a unit of work in the kernel for the purposes of -steering and scheduling, which in turn allows a simpler networking model in -multithreaded applications. - -Configurations --------------- - -In an Nx1 configuration, KCM logically provides multiple socket handles -to the same TCP connection. This allows parallelism between in I/O -operations on the TCP socket (for instance copyin and copyout of data is -parallelized). In an application, a KCM socket can be opened for each -processing thread and inserted into the epoll (similar to how SO_REUSEPORT -is used to allow multiple listener sockets on the same port). - -In a MxN configuration, multiple connections are established to the -same destination. These are used for simple load balancing. - -Message batching ----------------- - -The primary purpose of KCM is load balancing between KCM sockets and hence -threads in a nominal use case. Perfect load balancing, that is steering -each received message to a different KCM socket or steering each sent -message to a different TCP socket, can negatively impact performance -since this doesn't allow for affinities to be established. Balancing -based on groups, or batches of messages, can be beneficial for performance. - -On transmit, there are three ways an application can batch (pipeline) -messages on a KCM socket. - 1) Send multiple messages in a single sendmmsg. - 2) Send a group of messages each with a sendmsg call, where all messages - except the last have MSG_BATCH in the flags of sendmsg call. - 3) Create "super message" composed of multiple messages and send this - with a single sendmsg. - -On receive, the KCM module attempts to queue messages received on the -same KCM socket during each TCP ready callback. The targeted KCM socket -changes at each receive ready callback on the KCM socket. The application -does not need to configure this. - -Error handling --------------- - -An application should include a thread to monitor errors raised on -the TCP connection. Normally, this will be done by placing each -TCP socket attached to a KCM multiplexor in epoll set for POLLERR -event. If an error occurs on an attached TCP socket, KCM sets an EPIPE -on the socket thus waking up the application thread. When the application -sees the error (which may just be a disconnect) it should unattach the -socket from KCM and then close it. It is assumed that once an error is -posted on the TCP socket the data stream is unrecoverable (i.e. an error -may have occurred in the middle of receiving a message). - -TCP connection monitoring -------------------------- - -In KCM there is no means to correlate a message to the TCP socket that -was used to send or receive the message (except in the case there is -only one attached TCP socket). However, the application does retain -an open file descriptor to the socket so it will be able to get statistics -from the socket which can be used in detecting issues (such as high -retransmissions on the socket). |