[PATCH] Documentation update for relay interface

Here's updated documentation for the relay interface, rewritten to match the relayfs->relay changes. It also moves relayfs.txt to relay.txt in the process. It includes the changes to relayfs.txt previously posted by Randy Dunlap, thanks for those. The relay-apps examples have also been updated to match, and can be found on the sourceforge relayfs website. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: Tom Zanussi <zanussi@us.ibm.com> 2006-08-27 01:23:47 -0700
committer: Linus Torvalds <torvalds@g5.osdl.org> 2006-08-27 11:01:31 -0700
commit: e88d78f6ba50d773096e26ca3f5c2464853c682d (patch)
tree: dac2753ba84849ad741e1eb0e45d97714c1127ae /Documentation/filesystems
parent: 4edb9a143e31d2e191c199262226e1a5923ff8f7 (diff)
download: linux-e88d78f6ba50d773096e26ca3f5c2464853c682d.tar.gz
linux-e88d78f6ba50d773096e26ca3f5c2464853c682d.tar.bz2
linux-e88d78f6ba50d773096e26ca3f5c2464853c682d.zip
3 files changed, 481 insertions, 444 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 66fdc0744fe0..16dec61d7671 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -62,8 +62,8 @@ ramfs-rootfs-initramfs.txt
 	- info on the 'in memory' filesystems ramfs, rootfs and initramfs.
 reiser4.txt
 	- info on the Reiser4 filesystem based on dancing tree algorithms.
-relayfs.txt
-	- info on relayfs, for efficient streaming from kernel to user space.
+relay.txt
+	- info on relay, for efficient streaming from kernel to user space.
 romfs.txt
 	- description of the ROMFS filesystem.
 smbfs.txt
diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt
new file mode 100644
index 000000000000..d6788dae0349
--- /dev/null
+++ b/Documentation/filesystems/relay.txt
@@ -0,0 +1,479 @@
+relay interface (formerly relayfs)
+==================================
+
+The relay interface provides a means for kernel applications to
+efficiently log and transfer large quantities of data from the kernel
+to userspace via user-defined 'relay channels'.
+
+A 'relay channel' is a kernel->user data relay mechanism implemented
+as a set of per-cpu kernel buffers ('channel buffers'), each
+represented as a regular file ('relay file') in user space.  Kernel
+clients write into the channel buffers using efficient write
+functions; these automatically log into the current cpu's channel
+buffer.  User space applications mmap() or read() from the relay files
+and retrieve the data as it becomes available.  The relay files
+themselves are files created in a host filesystem, e.g. debugfs, and
+are associated with the channel buffers using the API described below.
+
+The format of the data logged into the channel buffers is completely
+up to the kernel client; the relay interface does however provide
+hooks which allow kernel clients to impose some structure on the
+buffer data.  The relay interface doesn't implement any form of data
+filtering - this also is left to the kernel client.  The purpose is to
+keep things as simple as possible.
+
+This document provides an overview of the relay interface API.  The
+details of the function parameters are documented along with the
+functions in the relay interface code - please see that for details.
+
+Semantics
+=========
+
+Each relay channel has one buffer per CPU, each buffer has one or more
+sub-buffers.  Messages are written to the first sub-buffer until it is
+too full to contain a new message, in which case it it is written to
+the next (if available).  Messages are never split across sub-buffers.
+At this point, userspace can be notified so it empties the first
+sub-buffer, while the kernel continues writing to the next.
+
+When notified that a sub-buffer is full, the kernel knows how many
+bytes of it are padding i.e. unused space occurring because a complete
+message couldn't fit into a sub-buffer.  Userspace can use this
+knowledge to copy only valid data.
+
+After copying it, userspace can notify the kernel that a sub-buffer
+has been consumed.
+
+A relay channel can operate in a mode where it will overwrite data not
+yet collected by userspace, and not wait for it to be consumed.
+
+The relay channel itself does not provide for communication of such
+data between userspace and kernel, allowing the kernel side to remain
+simple and not impose a single interface on userspace.  It does
+provide a set of examples and a separate helper though, described
+below.
+
+The read() interface both removes padding and internally consumes the
+read sub-buffers; thus in cases where read(2) is being used to drain
+the channel buffers, special-purpose communication between kernel and
+user isn't necessary for basic operation.
+
+One of the major goals of the relay interface is to provide a low
+overhead mechanism for conveying kernel data to userspace.  While the
+read() interface is easy to use, it's not as efficient as the mmap()
+approach; the example code attempts to make the tradeoff between the
+two approaches as small as possible.
+
+klog and relay-apps example code
+================================
+
+The relay interface itself is ready to use, but to make things easier,
+a couple simple utility functions and a set of examples are provided.
+
+The relay-apps example tarball, available on the relay sourceforge
+site, contains a set of self-contained examples, each consisting of a
+pair of .c files containing boilerplate code for each of the user and
+kernel sides of a relay application.  When combined these two sets of
+boilerplate code provide glue to easily stream data to disk, without
+having to bother with mundane housekeeping chores.
+
+The 'klog debugging functions' patch (klog.patch in the relay-apps
+tarball) provides a couple of high-level logging functions to the
+kernel which allow writing formatted text or raw data to a channel,
+regardless of whether a channel to write into exists or not, or even
+whether the relay interface is compiled into the kernel or not.  These
+functions allow you to put unconditional 'trace' statements anywhere
+in the kernel or kernel modules; only when there is a 'klog handler'
+registered will data actually be logged (see the klog and kleak
+examples for details).
+
+It is of course possible to use the relay interface from scratch,
+i.e. without using any of the relay-apps example code or klog, but
+you'll have to implement communication between userspace and kernel,
+allowing both to convey the state of buffers (full, empty, amount of
+padding).  The read() interface both removes padding and internally
+consumes the read sub-buffers; thus in cases where read(2) is being
+used to drain the channel buffers, special-purpose communication
+between kernel and user isn't necessary for basic operation.  Things
+such as buffer-full conditions would still need to be communicated via
+some channel though.
+
+klog and the relay-apps examples can be found in the relay-apps
+tarball on http://relayfs.sourceforge.net
+
+The relay interface user space API
+==================================
+
+The relay interface implements basic file operations for user space
+access to relay channel buffer data.  Here are the file operations
+that are available and some comments regarding their behavior:
+
+open()	    enables user to open an _existing_ channel buffer.
+
+mmap()      results in channel buffer being mapped into the caller's
+	    memory space. Note that you can't do a partial mmap - you
+	    must map the entire file, which is NRBUF * SUBBUFSIZE.
+
+read()      read the contents of a channel buffer.  The bytes read are
+	    'consumed' by the reader, i.e. they won't be available
+	    again to subsequent reads.  If the channel is being used
+	    in no-overwrite mode (the default), it can be read at any
+	    time even if there's an active kernel writer.  If the
+	    channel is being used in overwrite mode and there are
+	    active channel writers, results may be unpredictable -
+	    users should make sure that all logging to the channel has
+	    ended before using read() with overwrite mode.  Sub-buffer
+	    padding is automatically removed and will not be seen by
+	    the reader.
+
+sendfile()  transfer data from a channel buffer to an output file
+	    descriptor. Sub-buffer padding is automatically removed
+	    and will not be seen by the reader.
+
+poll()      POLLIN/POLLRDNORM/POLLERR supported.  User applications are
+	    notified when sub-buffer boundaries are crossed.
+
+close()     decrements the channel buffer's refcount.  When the refcount
+	    reaches 0, i.e. when no process or kernel client has the
+	    buffer open, the channel buffer is freed.
+
+In order for a user application to make use of relay files, the
+host filesystem must be mounted.  For example,
+
+	mount -t debugfs debugfs /debug
+
+NOTE:   the host filesystem doesn't need to be mounted for kernel
+	clients to create or use channels - it only needs to be
+	mounted when user space applications need access to the buffer
+	data.
+
+
+The relay interface kernel API
+==============================
+
+Here's a summary of the API the relay interface provides to in-kernel clients:
+
+TBD(curr. line MT:/API/)
+  channel management functions:
+
+    relay_open(base_filename, parent, subbuf_size, n_subbufs,
+               callbacks)
+    relay_close(chan)
+    relay_flush(chan)
+    relay_reset(chan)
+
+  channel management typically called on instigation of userspace:
+
+    relay_subbufs_consumed(chan, cpu, subbufs_consumed)
+
+  write functions:
+
+    relay_write(chan, data, length)
+    __relay_write(chan, data, length)
+    relay_reserve(chan, length)
+
+  callbacks:
+
+    subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
+    buf_mapped(buf, filp)
+    buf_unmapped(buf, filp)
+    create_buf_file(filename, parent, mode, buf, is_global)
+    remove_buf_file(dentry)
+
+  helper functions:
+
+    relay_buf_full(buf)
+    subbuf_start_reserve(buf, length)
+
+
+Creating a channel
+------------------
+
+relay_open() is used to create a channel, along with its per-cpu
+channel buffers.  Each channel buffer will have an associated file
+created for it in the host filesystem, which can be and mmapped or
+read from in user space.  The files are named basename0...basenameN-1
+where N is the number of online cpus, and by default will be created
+in the root of the filesystem (if the parent param is NULL).  If you
+want a directory structure to contain your relay files, you should
+create it using the host filesystem's directory creation function,
+e.g. debugfs_create_dir(), and pass the parent directory to
+relay_open().  Users are responsible for cleaning up any directory
+structure they create, when the channel is closed - again the host
+filesystem's directory removal functions should be used for that,
+e.g. debugfs_remove().
+
+In order for a channel to be created and the host filesystem's files
+associated with its channel buffers, the user must provide definitions
+for two callback functions, create_buf_file() and remove_buf_file().
+create_buf_file() is called once for each per-cpu buffer from
+relay_open() and allows the user to create the file which will be used
+to represent the corresponding channel buffer.  The callback should
+return the dentry of the file created to represent the channel buffer.
+remove_buf_file() must also be defined; it's responsible for deleting
+the file(s) created in create_buf_file() and is called during
+relay_close().
+
+Here are some typical definitions for these callbacks, in this case
+using debugfs:
+
+/*
+ * create_buf_file() callback.  Creates relay file in debugfs.
+ */
+static struct dentry *create_buf_file_handler(const char *filename,
+                                              struct dentry *parent,
+                                              int mode,
+                                              struct rchan_buf *buf,
+                                              int *is_global)
+{
+        return debugfs_create_file(filename, mode, parent, buf,
+	                           &relay_file_operations);
+}
+
+/*
+ * remove_buf_file() callback.  Removes relay file from debugfs.
+ */
+static int remove_buf_file_handler(struct dentry *dentry)
+{
+        debugfs_remove(dentry);
+
+        return 0;
+}
+
+/*
+ * relay interface callbacks
+ */
+static struct rchan_callbacks relay_callbacks =
+{
+        .create_buf_file = create_buf_file_handler,
+        .remove_buf_file = remove_buf_file_handler,
+};
+
+And an example relay_open() invocation using them:
+
+  chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks);
+
+If the create_buf_file() callback fails, or isn't defined, channel
+creation and thus relay_open() will fail.
+
+The total size of each per-cpu buffer is calculated by multiplying the
+number of sub-buffers by the sub-buffer size passed into relay_open().
+The idea behind sub-buffers is that they're basically an extension of
+double-buffering to N buffers, and they also allow applications to
+easily implement random-access-on-buffer-boundary schemes, which can
+be important for some high-volume applications.  The number and size
+of sub-buffers is completely dependent on the application and even for
+the same application, different conditions will warrant different
+values for these parameters at different times.  Typically, the right
+values to use are best decided after some experimentation; in general,
+though, it's safe to assume that having only 1 sub-buffer is a bad
+idea - you're guaranteed to either overwrite data or lose events
+depending on the channel mode being used.
+
+The create_buf_file() implementation can also be defined in such a way
+as to allow the creation of a single 'global' buffer instead of the
+default per-cpu set.  This can be useful for applications interested
+mainly in seeing the relative ordering of system-wide events without
+the need to bother with saving explicit timestamps for the purpose of
+merging/sorting per-cpu files in a postprocessing step.
+
+To have relay_open() create a global buffer, the create_buf_file()
+implementation should set the value of the is_global outparam to a
+non-zero value in addition to creating the file that will be used to
+represent the single buffer.  In the case of a global buffer,
+create_buf_file() and remove_buf_file() will be called only once.  The
+normal channel-writing functions, e.g. relay_write(), can still be
+used - writes from any cpu will transparently end up in the global
+buffer - but since it is a global buffer, callers should make sure
+they use the proper locking for such a buffer, either by wrapping
+writes in a spinlock, or by copying a write function from relay.h and
+creating a local version that internally does the proper locking.
+
+Channel 'modes'
+---------------
+
+relay channels can be used in either of two modes - 'overwrite' or
+'no-overwrite'.  The mode is entirely determined by the implementation
+of the subbuf_start() callback, as described below.  The default if no
+subbuf_start() callback is defined is 'no-overwrite' mode.  If the
+default mode suits your needs, and you plan to use the read()
+interface to retrieve channel data, you can ignore the details of this
+section, as it pertains mainly to mmap() implementations.
+
+In 'overwrite' mode, also known as 'flight recorder' mode, writes
+continuously cycle around the buffer and will never fail, but will
+unconditionally overwrite old data regardless of whether it's actually
+been consumed.  In no-overwrite mode, writes will fail, i.e. data will
+be lost, if the number of unconsumed sub-buffers equals the total
+number of sub-buffers in the channel.  It should be clear that if
+there is no consumer or if the consumer can't consume sub-buffers fast
+enough, data will be lost in either case; the only difference is
+whether data is lost from the beginning or the end of a buffer.
+
+As explained above, a relay channel is made of up one or more
+per-cpu channel buffers, each implemented as a circular buffer
+subdivided into one or more sub-buffers.  Messages are written into
+the current sub-buffer of the channel's current per-cpu buffer via the
+write functions described below.  Whenever a message can't fit into
+the current sub-buffer, because there's no room left for it, the
+client is notified via the subbuf_start() callback that a switch to a
+new sub-buffer is about to occur.  The client uses this callback to 1)
+initialize the next sub-buffer if appropriate 2) finalize the previous
+sub-buffer if appropriate and 3) return a boolean value indicating
+whether or not to actually move on to the next sub-buffer.
+
+To implement 'no-overwrite' mode, the userspace client would provide
+an implementation of the subbuf_start() callback something like the
+following:
+
+static int subbuf_start(struct rchan_buf *buf,
+                        void *subbuf,
+			void *prev_subbuf,
+			unsigned int prev_padding)
+{
+	if (prev_subbuf)
+		*((unsigned *)prev_subbuf) = prev_padding;
+
+	if (relay_buf_full(buf))
+		return 0;
+
+	subbuf_start_reserve(buf, sizeof(unsigned int));
+
+	return 1;
+}
+
+If the current buffer is full, i.e. all sub-buffers remain unconsumed,
+the callback returns 0 to indicate that the buffer switch should not
+occur yet, i.e. until the consumer has had a chance to read the
+current set of ready sub-buffers.  For the relay_buf_full() function
+to make sense, the consumer is reponsible for notifying the relay
+interface when sub-buffers have been consumed via
+relay_subbufs_consumed().  Any subsequent attempts to write into the
+buffer will again invoke the subbuf_start() callback with the same
+parameters; only when the consumer has consumed one or more of the
+ready sub-buffers will relay_buf_full() return 0, in which case the
+buffer switch can continue.
+
+The implementation of the subbuf_start() callback for 'overwrite' mode
+would be very similar:
+
+static int subbuf_start(struct rchan_buf *buf,
+                        void *subbuf,
+			void *prev_subbuf,
+			unsigned int prev_padding)
+{
+	if (prev_subbuf)
+		*((unsigned *)prev_subbuf) = prev_padding;
+
+	subbuf_start_reserve(buf, sizeof(unsigned int));
+
+	return 1;
+}
+
+In this case, the relay_buf_full() check is meaningless and the
+callback always returns 1, causing the buffer switch to occur
+unconditionally.  It's also meaningless for the client to use the
+relay_subbufs_consumed() function in this mode, as it's never
+consulted.
+
+The default subbuf_start() implementation, used if the client doesn't
+define any callbacks, or doesn't define the subbuf_start() callback,
+implements the simplest possible 'no-overwrite' mode, i.e. it does
+nothing but return 0.
+
+Header information can be reserved at the beginning of each sub-buffer
+by calling the subbuf_start_reserve() helper function from within the
+subbuf_start() callback.  This reserved area can be used to store
+whatever information the client wants.  In the example above, room is
+reserved in each sub-buffer to store the padding count for that
+sub-buffer.  This is filled in for the previous sub-buffer in the
+subbuf_start() implementation; the padding value for the previous
+sub-buffer is passed into the subbuf_start() callback along with a
+pointer to the previous sub-buffer, since the padding value isn't
+known until a sub-buffer is filled.  The subbuf_start() callback is
+also called for the first sub-buffer when the channel is opened, to
+give the client a chance to reserve space in it.  In this case the
+previous sub-buffer pointer passed into the callback will be NULL, so
+the client should check the value of the prev_subbuf pointer before
+writing into the previous sub-buffer.
+
+Writing to a channel
+--------------------
+
+Kernel clients write data into the current cpu's channel buffer using
+relay_write() or __relay_write().  relay_write() is the main logging
+function - it uses local_irqsave() to protect the buffer and should be
+used if you might be logging from interrupt context.  If you know
+you'll never be logging from interrupt context, you can use
+__relay_write(), which only disables preemption.  These functions
+don't return a value, so you can't determine whether or not they
+failed - the assumption is that you wouldn't want to check a return
+value in the fast logging path anyway, and that they'll always succeed
+unless the buffer is full and no-overwrite mode is being used, in
+which case you can detect a failed write in the subbuf_start()
+callback by calling the relay_buf_full() helper function.
+
+relay_reserve() is used to reserve a slot in a channel buffer which
+can be written to later.  This would typically be used in applications
+that need to write directly into a channel buffer without having to
+stage data in a temporary buffer beforehand.  Because the actual write
+may not happen immediately after the slot is reserved, applications
+using relay_reserve() can keep a count of the number of bytes actually
+written, either in space reserved in the sub-buffers themselves or as
+a separate array.  See the 'reserve' example in the relay-apps tarball
+at http://relayfs.sourceforge.net for an example of how this can be
+done.  Because the write is under control of the client and is
+separated from the reserve, relay_reserve() doesn't protect the buffer
+at all - it's up to the client to provide the appropriate
+synchronization when using relay_reserve().
+
+Closing a channel
+-----------------
+
+The client calls relay_close() when it's finished using the channel.
+The channel and its associated buffers are destroyed when there are no
+longer any references to any of the channel buffers.  relay_flush()
+forces a sub-buffer switch on all the channel buffers, and can be used
+to finalize and process the last sub-buffers before the channel is
+closed.
+
+Misc
+----
+
+Some applications may want to keep a channel around and re-use it
+rather than open and close a new channel for each use.  relay_reset()
+can be used for this purpose - it resets a channel to its initial
+state without reallocating channel buffer memory or destroying
+existing mappings.  It should however only be called when it's safe to
+do so, i.e. when the channel isn't currently being written to.
+
+Finally, there are a couple of utility callbacks that can be used for
+different purposes.  buf_mapped() is called whenever a channel buffer
+is mmapped from user space and buf_unmapped() is called when it's
+unmapped.  The client can use this notification to trigger actions
+within the kernel application, such as enabling/disabling logging to
+the channel.
+
+
+Resources
+=========
+
+For news, example code, mailing list, etc. see the relay interface homepage:
+
+    http://relayfs.sourceforge.net
+
+
+Credits
+=======
+
+The ideas and specs for the relay interface came about as a result of
+discussions on tracing involving the following:
+
+Michel Dagenais		<michel.dagenais@polymtl.ca>
+Richard Moore		<richardj_moore@uk.ibm.com>
+Bob Wisniewski		<bob@watson.ibm.com>
+Karim Yaghmour		<karim@opersys.com>
+Tom Zanussi		<zanussi@us.ibm.com>
+
+Also thanks to Hubertus Franke for a lot of useful suggestions and bug
+reports.
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt
deleted file mode 100644
index 5832377b7340..000000000000
--- a/Documentation/filesystems/relayfs.txt
+++ /dev/null
@@ -1,442 +0,0 @@
-
-relayfs - a high-speed data relay filesystem
-============================================
-
-relayfs is a filesystem designed to provide an efficient mechanism for
-tools and facilities to relay large and potentially sustained streams
-of data from kernel space to user space.
-
-The main abstraction of relayfs is the 'channel'.  A channel consists
-of a set of per-cpu kernel buffers each represented by a file in the
-relayfs filesystem.  Kernel clients write into a channel using
-efficient write functions which automatically log to the current cpu's
-channel buffer.  User space applications mmap() the per-cpu files and
-retrieve the data as it becomes available.
-
-The format of the data logged into the channel buffers is completely
-up to the relayfs client; relayfs does however provide hooks which
-allow clients to impose some structure on the buffer data.  Nor does
-relayfs implement any form of data filtering - this also is left to
-the client.  The purpose is to keep relayfs as simple as possible.
-
-This document provides an overview of the relayfs API.  The details of
-the function parameters are documented along with the functions in the
-filesystem code - please see that for details.
-
-Semantics
-=========
-
-Each relayfs channel has one buffer per CPU, each buffer has one or
-more sub-buffers. Messages are written to the first sub-buffer until
-it is too full to contain a new message, in which case it it is
-written to the next (if available).  Messages are never split across
-sub-buffers.  At this point, userspace can be notified so it empties
-the first sub-buffer, while the kernel continues writing to the next.
-
-When notified that a sub-buffer is full, the kernel knows how many
-bytes of it are padding i.e. unused.  Userspace can use this knowledge
-to copy only valid data.
-
-After copying it, userspace can notify the kernel that a sub-buffer
-has been consumed.
-
-relayfs can operate in a mode where it will overwrite data not yet
-collected by userspace, and not wait for it to consume it.
-
-relayfs itself does not provide for communication of such data between
-userspace and kernel, allowing the kernel side to remain simple and
-not impose a single interface on userspace. It does provide a set of
-examples and a separate helper though, described below.
-
-klog and relay-apps example code
-================================
-
-relayfs itself is ready to use, but to make things easier, a couple
-simple utility functions and a set of examples are provided.
-
-The relay-apps example tarball, available on the relayfs sourceforge
-site, contains a set of self-contained examples, each consisting of a
-pair of .c files containing boilerplate code for each of the user and
-kernel sides of a relayfs application; combined these two sets of
-boilerplate code provide glue to easily stream data to disk, without
-having to bother with mundane housekeeping chores.
-
-The 'klog debugging functions' patch (klog.patch in the relay-apps
-tarball) provides a couple of high-level logging functions to the
-kernel which allow writing formatted text or raw data to a channel,
-regardless of whether a channel to write into exists or not, or
-whether relayfs is compiled into the kernel or is configured as a
-module.  These functions allow you to put unconditional 'trace'
-statements anywhere in the kernel or kernel modules; only when there
-is a 'klog handler' registered will data actually be logged (see the
-klog and kleak examples for details).
-
-It is of course possible to use relayfs from scratch i.e. without
-using any of the relay-apps example code or klog, but you'll have to
-implement communication between userspace and kernel, allowing both to
-convey the state of buffers (full, empty, amount of padding).
-
-klog and the relay-apps examples can be found in the relay-apps
-tarball on http://relayfs.sourceforge.net
-
-
-The relayfs user space API
-==========================
-
-relayfs implements basic file operations for user space access to
-relayfs channel buffer data.  Here are the file operations that are
-available and some comments regarding their behavior:
-
-open()	 enables user to open an _existing_ buffer.
-
-mmap()	 results in channel buffer being mapped into the caller's
-	 memory space. Note that you can't do a partial mmap - you must
-	 map the entire file, which is NRBUF * SUBBUFSIZE.
-
-read()	 read the contents of a channel buffer.  The bytes read are
-	 'consumed' by the reader i.e. they won't be available again
-	 to subsequent reads.  If the channel is being used in
-	 no-overwrite mode (the default), it can be read at any time
-	 even if there's an active kernel writer.  If the channel is
-	 being used in overwrite mode and there are active channel
-	 writers, results may be unpredictable - users should make
-	 sure that all logging to the channel has ended before using
-	 read() with overwrite mode.
-
-poll()	 POLLIN/POLLRDNORM/POLLERR supported.  User applications are
-	 notified when sub-buffer boundaries are crossed.
-
-close() decrements the channel buffer's refcount.  When the refcount
-	reaches 0 i.e. when no process or kernel client has the buffer
-	open, the channel buffer is freed.
-
-
-In order for a user application to make use of relayfs files, the
-relayfs filesystem must be mounted.  For example,
-
-	mount -t relayfs relayfs /mnt/relay
-
-NOTE:	relayfs doesn't need to be mounted for kernel clients to create
-	or use channels - it only needs to be mounted when user space
-	applications need access to the buffer data.
-
-
-The relayfs kernel API
-======================
-
-Here's a summary of the API relayfs provides to in-kernel clients:
-
-
-  channel management functions:
-
-    relay_open(base_filename, parent, subbuf_size, n_subbufs,
-               callbacks)
-    relay_close(chan)
-    relay_flush(chan)
-    relay_reset(chan)
-    relayfs_create_dir(name, parent)
-    relayfs_remove_dir(dentry)
-    relayfs_create_file(name, parent, mode, fops, data)
-    relayfs_remove_file(dentry)
-
-  channel management typically called on instigation of userspace:
-
-    relay_subbufs_consumed(chan, cpu, subbufs_consumed)
-
-  write functions:
-
-    relay_write(chan, data, length)
-    __relay_write(chan, data, length)
-    relay_reserve(chan, length)
-
-  callbacks:
-
-    subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
-    buf_mapped(buf, filp)
-    buf_unmapped(buf, filp)
-    create_buf_file(filename, parent, mode, buf, is_global)
-    remove_buf_file(dentry)
-
-  helper functions:
-
-    relay_buf_full(buf)
-    subbuf_start_reserve(buf, length)
-
-
-Creating a channel
-------------------
-
-relay_open() is used to create a channel, along with its per-cpu
-channel buffers.  Each channel buffer will have an associated file
-created for it in the relayfs filesystem, which can be opened and
-mmapped from user space if desired.  The files are named
-basename0...basenameN-1 where N is the number of online cpus, and by
-default will be created in the root of the filesystem.  If you want a
-directory structure to contain your relayfs files, you can create it
-with relayfs_create_dir() and pass the parent directory to
-relay_open().  Clients are responsible for cleaning up any directory
-structure they create when the channel is closed - use
-relayfs_remove_dir() for that.
-
-The total size of each per-cpu buffer is calculated by multiplying the
-number of sub-buffers by the sub-buffer size passed into relay_open().
-The idea behind sub-buffers is that they're basically an extension of
-double-buffering to N buffers, and they also allow applications to
-easily implement random-access-on-buffer-boundary schemes, which can
-be important for some high-volume applications.  The number and size
-of sub-buffers is completely dependent on the application and even for
-the same application, different conditions will warrant different
-values for these parameters at different times.  Typically, the right
-values to use are best decided after some experimentation; in general,
-though, it's safe to assume that having only 1 sub-buffer is a bad
-idea - you're guaranteed to either overwrite data or lose events
-depending on the channel mode being used.
-
-Channel 'modes'
----------------
-
-relayfs channels can be used in either of two modes - 'overwrite' or
-'no-overwrite'.  The mode is entirely determined by the implementation
-of the subbuf_start() callback, as described below.  In 'overwrite'
-mode, also known as 'flight recorder' mode, writes continuously cycle
-around the buffer and will never fail, but will unconditionally
-overwrite old data regardless of whether it's actually been consumed.
-In no-overwrite mode, writes will fail i.e. data will be lost, if the
-number of unconsumed sub-buffers equals the total number of
-sub-buffers in the channel.  It should be clear that if there is no
-consumer or if the consumer can't consume sub-buffers fast enought,
-data will be lost in either case; the only difference is whether data
-is lost from the beginning or the end of a buffer.
-
-As explained above, a relayfs channel is made of up one or more
-per-cpu channel buffers, each implemented as a circular buffer
-subdivided into one or more sub-buffers.  Messages are written into
-the current sub-buffer of the channel's current per-cpu buffer via the
-write functions described below.  Whenever a message can't fit into
-the current sub-buffer, because there's no room left for it, the
-client is notified via the subbuf_start() callback that a switch to a
-new sub-buffer is about to occur.  The client uses this callback to 1)
-initialize the next sub-buffer if appropriate 2) finalize the previous
-sub-buffer if appropriate and 3) return a boolean value indicating
-whether or not to actually go ahead with the sub-buffer switch.
-
-To implement 'no-overwrite' mode, the userspace client would provide
-an implementation of the subbuf_start() callback something like the
-following:
-
-static int subbuf_start(struct rchan_buf *buf,
-                        void *subbuf,
-			void *prev_subbuf,
-			unsigned int prev_padding)
-{
-	if (prev_subbuf)
-		*((unsigned *)prev_subbuf) = prev_padding;
-
-	if (relay_buf_full(buf))
-		return 0;
-
-	subbuf_start_reserve(buf, sizeof(unsigned int));
-
-	return 1;
-}
-
-If the current buffer is full i.e. all sub-buffers remain unconsumed,
-the callback returns 0 to indicate that the buffer switch should not
-occur yet i.e. until the consumer has had a chance to read the current
-set of ready sub-buffers.  For the relay_buf_full() function to make
-sense, the consumer is reponsible for notifying relayfs when
-sub-buffers have been consumed via relay_subbufs_consumed().  Any
-subsequent attempts to write into the buffer will again invoke the
-subbuf_start() callback with the same parameters; only when the
-consumer has consumed one or more of the ready sub-buffers will
-relay_buf_full() return 0, in which case the buffer switch can
-continue.
-
-The implementation of the subbuf_start() callback for 'overwrite' mode
-would be very similar:
-
-static int subbuf_start(struct rchan_buf *buf,
-                        void *subbuf,
-			void *prev_subbuf,
-			unsigned int prev_padding)
-{
-	if (prev_subbuf)
-		*((unsigned *)prev_subbuf) = prev_padding;
-
-	subbuf_start_reserve(buf, sizeof(unsigned int));
-
-	return 1;
-}
-
-In this case, the relay_buf_full() check is meaningless and the
-callback always returns 1, causing the buffer switch to occur
-unconditionally.  It's also meaningless for the client to use the
-relay_subbufs_consumed() function in this mode, as it's never
-consulted.
-
-The default subbuf_start() implementation, used if the client doesn't
-define any callbacks, or doesn't define the subbuf_start() callback,
-implements the simplest possible 'no-overwrite' mode i.e. it does
-nothing but return 0.
-
-Header information can be reserved at the beginning of each sub-buffer
-by calling the subbuf_start_reserve() helper function from within the
-subbuf_start() callback.  This reserved area can be used to store
-whatever information the client wants.  In the example above, room is
-reserved in each sub-buffer to store the padding count for that
-sub-buffer.  This is filled in for the previous sub-buffer in the
-subbuf_start() implementation; the padding value for the previous
-sub-buffer is passed into the subbuf_start() callback along with a
-pointer to the previous sub-buffer, since the padding value isn't
-known until a sub-buffer is filled.  The subbuf_start() callback is
-also called for the first sub-buffer when the channel is opened, to
-give the client a chance to reserve space in it.  In this case the
-previous sub-buffer pointer passed into the callback will be NULL, so
-the client should check the value of the prev_subbuf pointer before
-writing into the previous sub-buffer.
-
-Writing to a channel
---------------------
-
-kernel clients write data into the current cpu's channel buffer using
-relay_write() or __relay_write().  relay_write() is the main logging
-function - it uses local_irqsave() to protect the buffer and should be
-used if you might be logging from interrupt context.  If you know
-you'll never be logging from interrupt context, you can use
-__relay_write(), which only disables preemption.  These functions
-don't return a value, so you can't determine whether or not they
-failed - the assumption is that you wouldn't want to check a return
-value in the fast logging path anyway, and that they'll always succeed
-unless the buffer is full and no-overwrite mode is being used, in
-which case you can detect a failed write in the subbuf_start()
-callback by calling the relay_buf_full() helper function.
-
-relay_reserve() is used to reserve a slot in a channel buffer which
-can be written to later.  This would typically be used in applications
-that need to write directly into a channel buffer without having to
-stage data in a temporary buffer beforehand.  Because the actual write
-may not happen immediately after the slot is reserved, applications
-using relay_reserve() can keep a count of the number of bytes actually
-written, either in space reserved in the sub-buffers themselves or as
-a separate array.  See the 'reserve' example in the relay-apps tarball
-at http://relayfs.sourceforge.net for an example of how this can be
-done.  Because the write is under control of the client and is
-separated from the reserve, relay_reserve() doesn't protect the buffer
-at all - it's up to the client to provide the appropriate
-synchronization when using relay_reserve().
-
-Closing a channel
------------------
-
-The client calls relay_close() when it's finished using the channel.
-The channel and its associated buffers are destroyed when there are no
-longer any references to any of the channel buffers.  relay_flush()
-forces a sub-buffer switch on all the channel buffers, and can be used
-to finalize and process the last sub-buffers before the channel is
-closed.
-
-Creating non-relay files
-------------------------
-
-relay_open() automatically creates files in the relayfs filesystem to
-represent the per-cpu kernel buffers; it's often useful for
-applications to be able to create their own files alongside the relay
-files in the relayfs filesystem as well e.g. 'control' files much like
-those created in /proc or debugfs for similar purposes, used to
-communicate control information between the kernel and user sides of a
-relayfs application.  For this purpose the relayfs_create_file() and
-relayfs_remove_file() API functions exist.  For relayfs_create_file(),
-the caller passes in a set of user-defined file operations to be used
-for the file and an optional void * to a user-specified data item,
-which will be accessible via inode->u.generic_ip (see the relay-apps
-tarball for examples).  The file_operations are a required parameter
-to relayfs_create_file() and thus the semantics of these files are
-completely defined by the caller.
-
-See the relay-apps tarball at http://relayfs.sourceforge.net for
-examples of how these non-relay files are meant to be used.
-
-Creating relay files in other filesystems
------------------------------------------
-
-By default of course, relay_open() creates relay files in the relayfs
-filesystem.  Because relay_file_operations is exported, however, it's
-also possible to create and use relay files in other pseudo-filesytems
-such as debugfs.
-
-For this purpose, two callback functions are provided,
-create_buf_file() and remove_buf_file().  create_buf_file() is called
-once for each per-cpu buffer from relay_open() to allow the client to
-create a file to be used to represent the corresponding buffer; if
-this callback is not defined, the default implementation will create
-and return a file in the relayfs filesystem to represent the buffer.
-The callback should return the dentry of the file created to represent
-the relay buffer.  Note that the parent directory passed to
-relay_open() (and passed along to the callback), if specified, must
-exist in the same filesystem the new relay file is created in.  If
-create_buf_file() is defined, remove_buf_file() must also be defined;
-it's responsible for deleting the file(s) created in create_buf_file()
-and is called during relay_close().
-
-The create_buf_file() implementation can also be defined in such a way
-as to allow the creation of a single 'global' buffer instead of the
-default per-cpu set.  This can be useful for applications interested
-mainly in seeing the relative ordering of system-wide events without
-the need to bother with saving explicit timestamps for the purpose of
-merging/sorting per-cpu files in a postprocessing step.
-
-To have relay_open() create a global buffer, the create_buf_file()
-implementation should set the value of the is_global outparam to a
-non-zero value in addition to creating the file that will be used to
-represent the single buffer.  In the case of a global buffer,
-create_buf_file() and remove_buf_file() will be called only once.  The
-normal channel-writing functions e.g. relay_write() can still be used
-- writes from any cpu will transparently end up in the global buffer -
-but since it is a global buffer, callers should make sure they use the
-proper locking for such a buffer, either by wrapping writes in a
-spinlock, or by copying a write function from relayfs_fs.h and
-creating a local version that internally does the proper locking.
-
-See the 'exported-relayfile' examples in the relay-apps tarball for
-examples of creating and using relay files in debugfs.
-
-Misc
-----
-
-Some applications may want to keep a channel around and re-use it
-rather than open and close a new channel for each use.  relay_reset()
-can be used for this purpose - it resets a channel to its initial
-state without reallocating channel buffer memory or destroying
-existing mappings.  It should however only be called when it's safe to
-do so i.e. when the channel isn't currently being written to.
-
-Finally, there are a couple of utility callbacks that can be used for
-different purposes.  buf_mapped() is called whenever a channel buffer
-is mmapped from user space and buf_unmapped() is called when it's
-unmapped.  The client can use this notification to trigger actions
-within the kernel application, such as enabling/disabling logging to
-the channel.
-
-
-Resources
-=========
-
-For news, example code, mailing list, etc. see the relayfs homepage:
-
-    http://relayfs.sourceforge.net
-
-
-Credits
-=======
-
-The ideas and specs for relayfs came about as a result of discussions
-on tracing involving the following:
-
-Michel Dagenais		<michel.dagenais@polymtl.ca>
-Richard Moore		<richardj_moore@uk.ibm.com>
-Bob Wisniewski		<bob@watson.ibm.com>
-Karim Yaghmour		<karim@opersys.com>
-Tom Zanussi		<zanussi@us.ibm.com>
-
-Also thanks to Hubertus Franke for a lot of useful suggestions and bug
-reports.
author	Tom Zanussi <zanussi@us.ibm.com>	2006-08-27 01:23:47 -0700
committer	Linus Torvalds <torvalds@g5.osdl.org>	2006-08-27 11:01:31 -0700
commit	e88d78f6ba50d773096e26ca3f5c2464853c682d (patch)
tree	dac2753ba84849ad741e1eb0e45d97714c1127ae /Documentation/filesystems
parent	4edb9a143e31d2e191c199262226e1a5923ff8f7 (diff)
download	linux-e88d78f6ba50d773096e26ca3f5c2464853c682d.tar.gz linux-e88d78f6ba50d773096e26ca3f5c2464853c682d.tar.bz2 linux-e88d78f6ba50d773096e26ca3f5c2464853c682d.zip