diff options
author | Greg Kroah-Hartman <gregkh@suse.de> | 2006-01-06 12:59:59 -0800 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@suse.de> | 2006-01-06 12:59:59 -0800 |
commit | ccf18968b1bbc2fb117190a1984ac2a826dac228 (patch) | |
tree | 7bc8fbf5722aecf1e84fa50c31c657864cba1daa /Documentation | |
parent | e91c021c487110386a07facd0396e6c3b7cf9c1f (diff) | |
parent | d99cf9d679a520d67f81d805b7cb91c68e1847f0 (diff) | |
download | linux-ccf18968b1bbc2fb117190a1984ac2a826dac228.tar.gz linux-ccf18968b1bbc2fb117190a1984ac2a826dac228.tar.bz2 linux-ccf18968b1bbc2fb117190a1984ac2a826dac228.zip |
Merge ../torvalds-2.6/
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/block/biodoc.txt | 10 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 11 | ||||
-rw-r--r-- | Documentation/filesystems/00-INDEX | 6 | ||||
-rw-r--r-- | Documentation/filesystems/configfs/configfs.txt | 434 | ||||
-rw-r--r-- | Documentation/filesystems/configfs/configfs_example.c | 474 | ||||
-rw-r--r-- | Documentation/filesystems/dlmfs.txt | 130 | ||||
-rw-r--r-- | Documentation/filesystems/ocfs2.txt | 55 | ||||
-rw-r--r-- | Documentation/keys.txt | 18 | ||||
-rw-r--r-- | Documentation/md.txt | 120 | ||||
-rw-r--r-- | Documentation/power/interface.txt | 11 | ||||
-rw-r--r-- | Documentation/power/swsusp.txt | 5 |
11 files changed, 1237 insertions, 37 deletions
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 303c57a7fad9..8e63831971d5 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -263,14 +263,8 @@ A flag in the bio structure, BIO_BARRIER is used to identify a barrier i/o. The generic i/o scheduler would make sure that it places the barrier request and all other requests coming after it after all the previous requests in the queue. Barriers may be implemented in different ways depending on the -driver. A SCSI driver for example could make use of ordered tags to -preserve the necessary ordering with a lower impact on throughput. For IDE -this might be two sync cache flush: a pre and post flush when encountering -a barrier write. - -There is a provision for queues to indicate what kind of barriers they -can provide. This is as of yet unmerged, details will be added here once it -is in the kernel. +driver. For more details regarding I/O barriers, please read barrier.txt +in this directory. 1.2.2 Request Priority/Latency diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index cb13b963f7ae..9474501dd6cc 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -47,17 +47,6 @@ Who: Paul E. McKenney <paulmck@us.ibm.com> --------------------------- -What: IEEE1394 Audio and Music Data Transmission Protocol driver, - Connection Management Procedures driver -When: November 2005 -Files: drivers/ieee1394/{amdtp,cmp}* -Why: These are incomplete, have never worked, and are better implemented - in userland via raw1394 (see http://freebob.sourceforge.net/ for - example.) -Who: Jody McIntyre <scjody@steamballoon.com> - ---------------------------- - What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN When: November 2005 Why: Deprecated in favour of the new ioctl-based rawiso interface, which is diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 7e17712f3229..74052d22d868 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -12,10 +12,14 @@ cifs.txt - description of the CIFS filesystem coda.txt - description of the CODA filesystem. +configfs/ + - directory containing configfs documentation and example code. cramfs.txt - info on the cram filesystem for small storage (ROMs etc) devfs/ - directory containing devfs documentation. +dlmfs.txt + - info on the userspace interface to the OCFS2 DLM. ext2.txt - info, mount options and specifications for the Ext2 filesystem. hpfs.txt @@ -30,6 +34,8 @@ ntfs.txt - info and mount options for the NTFS filesystem (Windows NT). proc.txt - info on Linux's /proc filesystem. +ocfs2.txt + - info and mount options for the OCFS2 clustered filesystem. romfs.txt - Description of the ROMFS filesystem. smbfs.txt diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt new file mode 100644 index 000000000000..c4ff96b7c4e0 --- /dev/null +++ b/Documentation/filesystems/configfs/configfs.txt @@ -0,0 +1,434 @@ + +configfs - Userspace-driven kernel object configuation. + +Joel Becker <joel.becker@oracle.com> + +Updated: 31 March 2005 + +Copyright (c) 2005 Oracle Corporation, + Joel Becker <joel.becker@oracle.com> + + +[What is configfs?] + +configfs is a ram-based filesystem that provides the converse of +sysfs's functionality. Where sysfs is a filesystem-based view of +kernel objects, configfs is a filesystem-based manager of kernel +objects, or config_items. + +With sysfs, an object is created in kernel (for example, when a device +is discovered) and it is registered with sysfs. Its attributes then +appear in sysfs, allowing userspace to read the attributes via +readdir(3)/read(2). It may allow some attributes to be modified via +write(2). The important point is that the object is created and +destroyed in kernel, the kernel controls the lifecycle of the sysfs +representation, and sysfs is merely a window on all this. + +A configfs config_item is created via an explicit userspace operation: +mkdir(2). It is destroyed via rmdir(2). The attributes appear at +mkdir(2) time, and can be read or modified via read(2) and write(2). +As with sysfs, readdir(3) queries the list of items and/or attributes. +symlink(2) can be used to group items together. Unlike sysfs, the +lifetime of the representation is completely driven by userspace. The +kernel modules backing the items must respond to this. + +Both sysfs and configfs can and should exist together on the same +system. One is not a replacement for the other. + +[Using configfs] + +configfs can be compiled as a module or into the kernel. You can access +it by doing + + mount -t configfs none /config + +The configfs tree will be empty unless client modules are also loaded. +These are modules that register their item types with configfs as +subsystems. Once a client subsystem is loaded, it will appear as a +subdirectory (or more than one) under /config. Like sysfs, the +configfs tree is always there, whether mounted on /config or not. + +An item is created via mkdir(2). The item's attributes will also +appear at this time. readdir(3) can determine what the attributes are, +read(2) can query their default values, and write(2) can store new +values. Like sysfs, attributes should be ASCII text files, preferably +with only one value per file. The same efficiency caveats from sysfs +apply. Don't mix more than one attribute in one attribute file. + +Like sysfs, configfs expects write(2) to store the entire buffer at +once. When writing to configfs attributes, userspace processes should +first read the entire file, modify the portions they wish to change, and +then write the entire buffer back. Attribute files have a maximum size +of one page (PAGE_SIZE, 4096 on i386). + +When an item needs to be destroyed, remove it with rmdir(2). An +item cannot be destroyed if any other item has a link to it (via +symlink(2)). Links can be removed via unlink(2). + +[Configuring FakeNBD: an Example] + +Imagine there's a Network Block Device (NBD) driver that allows you to +access remote block devices. Call it FakeNBD. FakeNBD uses configfs +for its configuration. Obviously, there will be a nice program that +sysadmins use to configure FakeNBD, but somehow that program has to tell +the driver about it. Here's where configfs comes in. + +When the FakeNBD driver is loaded, it registers itself with configfs. +readdir(3) sees this just fine: + + # ls /config + fakenbd + +A fakenbd connection can be created with mkdir(2). The name is +arbitrary, but likely the tool will make some use of the name. Perhaps +it is a uuid or a disk name: + + # mkdir /config/fakenbd/disk1 + # ls /config/fakenbd/disk1 + target device rw + +The target attribute contains the IP address of the server FakeNBD will +connect to. The device attribute is the device on the server. +Predictably, the rw attribute determines whether the connection is +read-only or read-write. + + # echo 10.0.0.1 > /config/fakenbd/disk1/target + # echo /dev/sda1 > /config/fakenbd/disk1/device + # echo 1 > /config/fakenbd/disk1/rw + +That's it. That's all there is. Now the device is configured, via the +shell no less. + +[Coding With configfs] + +Every object in configfs is a config_item. A config_item reflects an +object in the subsystem. It has attributes that match values on that +object. configfs handles the filesystem representation of that object +and its attributes, allowing the subsystem to ignore all but the +basic show/store interaction. + +Items are created and destroyed inside a config_group. A group is a +collection of items that share the same attributes and operations. +Items are created by mkdir(2) and removed by rmdir(2), but configfs +handles that. The group has a set of operations to perform these tasks + +A subsystem is the top level of a client module. During initialization, +the client module registers the subsystem with configfs, the subsystem +appears as a directory at the top of the configfs filesystem. A +subsystem is also a config_group, and can do everything a config_group +can. + +[struct config_item] + + struct config_item { + char *ci_name; + char ci_namebuf[UOBJ_NAME_LEN]; + struct kref ci_kref; + struct list_head ci_entry; + struct config_item *ci_parent; + struct config_group *ci_group; + struct config_item_type *ci_type; + struct dentry *ci_dentry; + }; + + void config_item_init(struct config_item *); + void config_item_init_type_name(struct config_item *, + const char *name, + struct config_item_type *type); + struct config_item *config_item_get(struct config_item *); + void config_item_put(struct config_item *); + +Generally, struct config_item is embedded in a container structure, a +structure that actually represents what the subsystem is doing. The +config_item portion of that structure is how the object interacts with +configfs. + +Whether statically defined in a source file or created by a parent +config_group, a config_item must have one of the _init() functions +called on it. This initializes the reference count and sets up the +appropriate fields. + +All users of a config_item should have a reference on it via +config_item_get(), and drop the reference when they are done via +config_item_put(). + +By itself, a config_item cannot do much more than appear in configfs. +Usually a subsystem wants the item to display and/or store attributes, +among other things. For that, it needs a type. + +[struct config_item_type] + + struct configfs_item_operations { + void (*release)(struct config_item *); + ssize_t (*show_attribute)(struct config_item *, + struct configfs_attribute *, + char *); + ssize_t (*store_attribute)(struct config_item *, + struct configfs_attribute *, + const char *, size_t); + int (*allow_link)(struct config_item *src, + struct config_item *target); + int (*drop_link)(struct config_item *src, + struct config_item *target); + }; + + struct config_item_type { + struct module *ct_owner; + struct configfs_item_operations *ct_item_ops; + struct configfs_group_operations *ct_group_ops; + struct configfs_attribute **ct_attrs; + }; + +The most basic function of a config_item_type is to define what +operations can be performed on a config_item. All items that have been +allocated dynamically will need to provide the ct_item_ops->release() +method. This method is called when the config_item's reference count +reaches zero. Items that wish to display an attribute need to provide +the ct_item_ops->show_attribute() method. Similarly, storing a new +attribute value uses the store_attribute() method. + +[struct configfs_attribute] + + struct configfs_attribute { + char *ca_name; + struct module *ca_owner; + mode_t ca_mode; + }; + +When a config_item wants an attribute to appear as a file in the item's +configfs directory, it must define a configfs_attribute describing it. +It then adds the attribute to the NULL-terminated array +config_item_type->ct_attrs. When the item appears in configfs, the +attribute file will appear with the configfs_attribute->ca_name +filename. configfs_attribute->ca_mode specifies the file permissions. + +If an attribute is readable and the config_item provides a +ct_item_ops->show_attribute() method, that method will be called +whenever userspace asks for a read(2) on the attribute. The converse +will happen for write(2). + +[struct config_group] + +A config_item cannot live in a vaccum. The only way one can be created +is via mkdir(2) on a config_group. This will trigger creation of a +child item. + + struct config_group { + struct config_item cg_item; + struct list_head cg_children; + struct configfs_subsystem *cg_subsys; + struct config_group **default_groups; + }; + + void config_group_init(struct config_group *group); + void config_group_init_type_name(struct config_group *group, + const char *name, + struct config_item_type *type); + + +The config_group structure contains a config_item. Properly configuring +that item means that a group can behave as an item in its own right. +However, it can do more: it can create child items or groups. This is +accomplished via the group operations specified on the group's +config_item_type. + + struct configfs_group_operations { + struct config_item *(*make_item)(struct config_group *group, + const char *name); + struct config_group *(*make_group)(struct config_group *group, + const char *name); + int (*commit_item)(struct config_item *item); + void (*drop_item)(struct config_group *group, + struct config_item *item); + }; + +A group creates child items by providing the +ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new +config_item (or more likely, its container structure), initializes it, +and returns it to configfs. Configfs will then populate the filesystem +tree to reflect the new item. + +If the subsystem wants the child to be a group itself, the subsystem +provides ct_group_ops->make_group(). Everything else behaves the same, +using the group _init() functions on the group. + +Finally, when userspace calls rmdir(2) on the item or group, +ct_group_ops->drop_item() is called. As a config_group is also a +config_item, it is not necessary for a seperate drop_group() method. +The subsystem must config_item_put() the reference that was initialized +upon item allocation. If a subsystem has no work to do, it may omit +the ct_group_ops->drop_item() method, and configfs will call +config_item_put() on the item on behalf of the subsystem. + +IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2) +is called, configfs WILL remove the item from the filesystem tree +(assuming that it has no children to keep it busy). The subsystem is +responsible for responding to this. If the subsystem has references to +the item in other threads, the memory is safe. It may take some time +for the item to actually disappear from the subsystem's usage. But it +is gone from configfs. + +A config_group cannot be removed while it still has child items. This +is implemented in the configfs rmdir(2) code. ->drop_item() will not be +called, as the item has not been dropped. rmdir(2) will fail, as the +directory is not empty. + +[struct configfs_subsystem] + +A subsystem must register itself, ususally at module_init time. This +tells configfs to make the subsystem appear in the file tree. + + struct configfs_subsystem { + struct config_group su_group; + struct semaphore su_sem; + }; + + int configfs_register_subsystem(struct configfs_subsystem *subsys); + void configfs_unregister_subsystem(struct configfs_subsystem *subsys); + + A subsystem consists of a toplevel config_group and a semaphore. +The group is where child config_items are created. For a subsystem, +this group is usually defined statically. Before calling +configfs_register_subsystem(), the subsystem must have initialized the +group via the usual group _init() functions, and it must also have +initialized the semaphore. + When the register call returns, the subsystem is live, and it +will be visible via configfs. At that point, mkdir(2) can be called and +the subsystem must be ready for it. + +[An Example] + +The best example of these basic concepts is the simple_children +subsystem/group and the simple_child item in configfs_example.c It +shows a trivial object displaying and storing an attribute, and a simple +group creating and destroying these children. + +[Hierarchy Navigation and the Subsystem Semaphore] + +There is an extra bonus that configfs provides. The config_groups and +config_items are arranged in a hierarchy due to the fact that they +appear in a filesystem. A subsystem is NEVER to touch the filesystem +parts, but the subsystem might be interested in this hierarchy. For +this reason, the hierarchy is mirrored via the config_group->cg_children +and config_item->ci_parent structure members. + +A subsystem can navigate the cg_children list and the ci_parent pointer +to see the tree created by the subsystem. This can race with configfs' +management of the hierarchy, so configfs uses the subsystem semaphore to +protect modifications. Whenever a subsystem wants to navigate the +hierarchy, it must do so under the protection of the subsystem +semaphore. + +A subsystem will be prevented from acquiring the semaphore while a newly +allocated item has not been linked into this hierarchy. Similarly, it +will not be able to acquire the semaphore while a dropping item has not +yet been unlinked. This means that an item's ci_parent pointer will +never be NULL while the item is in configfs, and that an item will only +be in its parent's cg_children list for the same duration. This allows +a subsystem to trust ci_parent and cg_children while they hold the +semaphore. + +[Item Aggregation Via symlink(2)] + +configfs provides a simple group via the group->item parent/child +relationship. Often, however, a larger environment requires aggregation +outside of the parent/child connection. This is implemented via +symlink(2). + +A config_item may provide the ct_item_ops->allow_link() and +ct_item_ops->drop_link() methods. If the ->allow_link() method exists, +symlink(2) may be called with the config_item as the source of the link. +These links are only allowed between configfs config_items. Any +symlink(2) attempt outside the configfs filesystem will be denied. + +When symlink(2) is called, the source config_item's ->allow_link() +method is called with itself and a target item. If the source item +allows linking to target item, it returns 0. A source item may wish to +reject a link if it only wants links to a certain type of object (say, +in its own subsystem). + +When unlink(2) is called on the symbolic link, the source item is +notified via the ->drop_link() method. Like the ->drop_item() method, +this is a void function and cannot return failure. The subsystem is +responsible for responding to the change. + +A config_item cannot be removed while it links to any other item, nor +can it be removed while an item links to it. Dangling symlinks are not +allowed in configfs. + +[Automatically Created Subgroups] + +A new config_group may want to have two types of child config_items. +While this could be codified by magic names in ->make_item(), it is much +more explicit to have a method whereby userspace sees this divergence. + +Rather than have a group where some items behave differently than +others, configfs provides a method whereby one or many subgroups are +automatically created inside the parent at its creation. Thus, +mkdir("parent) results in "parent", "parent/subgroup1", up through +"parent/subgroupN". Items of type 1 can now be created in +"parent/subgroup1", and items of type N can be created in +"parent/subgroupN". + +These automatic subgroups, or default groups, do not preclude other +children of the parent group. If ct_group_ops->make_group() exists, +other child groups can be created on the parent group directly. + +A configfs subsystem specifies default groups by filling in the +NULL-terminated array default_groups on the config_group structure. +Each group in that array is populated in the configfs tree at the same +time as the parent group. Similarly, they are removed at the same time +as the parent. No extra notification is provided. When a ->drop_item() +method call notifies the subsystem the parent group is going away, it +also means every default group child associated with that parent group. + +As a consequence of this, default_groups cannot be removed directly via +rmdir(2). They also are not considered when rmdir(2) on the parent +group is checking for children. + +[Committable Items] + +NOTE: Committable items are currently unimplemented. + +Some config_items cannot have a valid initial state. That is, no +default values can be specified for the item's attributes such that the +item can do its work. Userspace must configure one or more attributes, +after which the subsystem can start whatever entity this item +represents. + +Consider the FakeNBD device from above. Without a target address *and* +a target device, the subsystem has no idea what block device to import. +The simple example assumes that the subsystem merely waits until all the +appropriate attributes are configured, and then connects. This will, +indeed, work, but now every attribute store must check if the attributes +are initialized. Every attribute store must fire off the connection if +that condition is met. + +Far better would be an explicit action notifying the subsystem that the +config_item is ready to go. More importantly, an explicit action allows +the subsystem to provide feedback as to whether the attibutes are +initialized in a way that makes sense. configfs provides this as +committable items. + +configfs still uses only normal filesystem operations. An item is +committed via rename(2). The item is moved from a directory where it +can be modified to a directory where it cannot. + +Any group that provides the ct_group_ops->commit_item() method has +committable items. When this group appears in configfs, mkdir(2) will +not work directly in the group. Instead, the group will have two +subdirectories: "live" and "pending". The "live" directory does not +support mkdir(2) or rmdir(2) either. It only allows rename(2). The +"pending" directory does allow mkdir(2) and rmdir(2). An item is +created in the "pending" directory. Its attributes can be modified at +will. Userspace commits the item by renaming it into the "live" +directory. At this point, the subsystem recieves the ->commit_item() +callback. If all required attributes are filled to satisfaction, the +method returns zero and the item is moved to the "live" directory. + +As rmdir(2) does not work in the "live" directory, an item must be +shutdown, or "uncommitted". Again, this is done via rename(2), this +time from the "live" directory back to the "pending" one. The subsystem +is notified by the ct_group_ops->uncommit_object() method. + + diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c new file mode 100644 index 000000000000..f3c6e4946f98 --- /dev/null +++ b/Documentation/filesystems/configfs/configfs_example.c @@ -0,0 +1,474 @@ +/* + * vim: noexpandtab ts=8 sts=0 sw=8: + * + * configfs_example.c - This file is a demonstration module containing + * a number of configfs subsystems. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + * + * Based on sysfs: + * sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel + * + * configfs Copyright (C) 2005 Oracle. All rights reserved. + */ + +#include <linux/init.h> +#include <linux/module.h> +#include <linux/slab.h> + +#include <linux/configfs.h> + + + +/* + * 01-childless + * + * This first example is a childless subsystem. It cannot create + * any config_items. It just has attributes. + * + * Note that we are enclosing the configfs_subsystem inside a container. + * This is not necessary if a subsystem has no attributes directly + * on the subsystem. See the next example, 02-simple-children, for + * such a subsystem. + */ + +struct childless { + struct configfs_subsystem subsys; + int showme; + int storeme; +}; + +struct childless_attribute { + struct configfs_attribute attr; + ssize_t (*show)(struct childless *, char *); + ssize_t (*store)(struct childless *, const char *, size_t); +}; + +static inline struct childless *to_childless(struct config_item *item) +{ + return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL; +} + +static ssize_t childless_showme_read(struct childless *childless, + char *page) +{ + ssize_t pos; + + pos = sprintf(page, "%d\n", childless->showme); + childless->showme++; + + return pos; +} + +static ssize_t childless_storeme_read(struct childless *childless, + char *page) +{ + return sprintf(page, "%d\n", childless->storeme); +} + +static ssize_t childless_storeme_write(struct childless *childless, + const char *page, + size_t count) +{ + unsigned long tmp; + char *p = (char *) page; + + tmp = simple_strtoul(p, &p, 10); + if (!p || (*p && (*p != '\n'))) + return -EINVAL; + + if (tmp > INT_MAX) + return -ERANGE; + + childless->storeme = tmp; + + return count; +} + +static ssize_t childless_description_read(struct childless *childless, + char *page) +{ + return sprintf(page, +"[01-childless]\n" +"\n" +"The childless subsystem is the simplest possible subsystem in\n" +"configfs. It does not support the creation of child config_items.\n" +"It only has a few attributes. In fact, it isn't much different\n" +"than a directory in /proc.\n"); +} + +static struct childless_attribute childless_attr_showme = { + .attr = { .ca_owner = THIS_MODULE, .ca_name = "showme", .ca_mode = S_IRUGO }, + .show = childless_showme_read, +}; +static struct childless_attribute childless_attr_storeme = { + .attr = { .ca_owner = THIS_MODULE, .ca_name = "storeme", .ca_mode = S_IRUGO | S_IWUSR }, + .show = childless_storeme_read, + .store = childless_storeme_write, +}; +static struct childless_attribute childless_attr_description = { + .attr = { .ca_owner = THIS_MODULE, .ca_name = "description", .ca_mode = S_IRUGO }, + .show = childless_description_read, +}; + +static struct configfs_attribute *childless_attrs[] = { + &childless_attr_showme.attr, + &childless_attr_storeme.attr, + &childless_attr_description.attr, + NULL, +}; + +static ssize_t childless_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + struct childless *childless = to_childless(item); + struct childless_attribute *childless_attr = + container_of(attr, struct childless_attribute, attr); + ssize_t ret = 0; + + if (childless_attr->show) + ret = childless_attr->show(childless, page); + return ret; +} + +static ssize_t childless_attr_store(struct config_item *item, + struct configfs_attribute *attr, + const char *page, size_t count) +{ + struct childless *childless = to_childless(item); + struct childless_attribute *childless_attr = + container_of(attr, struct childless_attribute, attr); + ssize_t ret = -EINVAL; + + if (childless_attr->store) + ret = childless_attr->store(childless, page, count); + return ret; +} + +static struct configfs_item_operations childless_item_ops = { + .show_attribute = childless_attr_show, + .store_attribute = childless_attr_store, +}; + +static struct config_item_type childless_type = { + .ct_item_ops = &childless_item_ops, + .ct_attrs = childless_attrs, + .ct_owner = THIS_MODULE, +}; + +static struct childless childless_subsys = { + .subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "01-childless", + .ci_type = &childless_type, + }, + }, + }, +}; + + +/* ----------------------------------------------------------------- */ + +/* + * 02-simple-children + * + * This example merely has a simple one-attribute child. Note that + * there is no extra attribute structure, as the child's attribute is + * known from the get-go. Also, there is no container for the + * subsystem, as it has no attributes of its own. + */ + +struct simple_child { + struct config_item item; + int storeme; +}; + +static inline struct simple_child *to_simple_child(struct config_item *item) +{ + return item ? container_of(item, struct simple_child, item) : NULL; +} + +static struct configfs_attribute simple_child_attr_storeme = { + .ca_owner = THIS_MODULE, + .ca_name = "storeme", + .ca_mode = S_IRUGO | S_IWUSR, +}; + +static struct configfs_attribute *simple_child_attrs[] = { + &simple_child_attr_storeme, + NULL, +}; + +static ssize_t simple_child_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + ssize_t count; + struct simple_child *simple_child = to_simple_child(item); + + count = sprintf(page, "%d\n", simple_child->storeme); + + return count; +} + +static ssize_t simple_child_attr_store(struct config_item *item, + struct configfs_attribute *attr, + const char *page, size_t count) +{ + struct simple_child *simple_child = to_simple_child(item); + unsigned long tmp; + char *p = (char *) page; + + tmp = simple_strtoul(p, &p, 10); + if (!p || (*p && (*p != '\n'))) + return -EINVAL; + + if (tmp > INT_MAX) + return -ERANGE; + + simple_child->storeme = tmp; + + return count; +} + +static void simple_child_release(struct config_item *item) +{ + kfree(to_simple_child(item)); +} + +static struct configfs_item_operations simple_child_item_ops = { + .release = simple_child_release, + .show_attribute = simple_child_attr_show, + .store_attribute = simple_child_attr_store, +}; + +static struct config_item_type simple_child_type = { + .ct_item_ops = &simple_child_item_ops, + .ct_attrs = simple_child_attrs, + .ct_owner = THIS_MODULE, +}; + + +static struct config_item *simple_children_make_item(struct config_group *group, const char *name) +{ + struct simple_child *simple_child; + + simple_child = kmalloc(sizeof(struct simple_child), GFP_KERNEL); + if (!simple_child) + return NULL; + + memset(simple_child, 0, sizeof(struct simple_child)); + + config_item_init_type_name(&simple_child->item, name, + &simple_child_type); + + simple_child->storeme = 0; + + return &simple_child->item; +} + +static struct configfs_attribute simple_children_attr_description = { + .ca_owner = THIS_MODULE, + .ca_name = "description", + .ca_mode = S_IRUGO, +}; + +static struct configfs_attribute *simple_children_attrs[] = { + &simple_children_attr_description, + NULL, +}; + +static ssize_t simple_children_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + return sprintf(page, +"[02-simple-children]\n" +"\n" +"This subsystem allows the creation of child config_items. These\n" +"items have only one attribute that is readable and writeable.\n"); +} + +static struct configfs_item_operations simple_children_item_ops = { + .show_attribute = simple_children_attr_show, +}; + +/* + * Note that, since no extra work is required on ->drop_item(), + * no ->drop_item() is provided. + */ +static struct configfs_group_operations simple_children_group_ops = { + .make_item = simple_children_make_item, +}; + +static struct config_item_type simple_children_type = { + .ct_item_ops = &simple_children_item_ops, + .ct_group_ops = &simple_children_group_ops, + .ct_attrs = simple_children_attrs, +}; + +static struct configfs_subsystem simple_children_subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "02-simple-children", + .ci_type = &simple_children_type, + }, + }, +}; + + +/* ----------------------------------------------------------------- */ + +/* + * 03-group-children + * + * This example reuses the simple_children group from above. However, + * the simple_children group is not the subsystem itself, it is a + * child of the subsystem. Creation of a group in the subsystem creates + * a new simple_children group. That group can then have simple_child + * children of its own. + */ + +struct simple_children { + struct config_group group; +}; + +static struct config_group *group_children_make_group(struct config_group *group, const char *name) +{ + struct simple_children *simple_children; + + simple_children = kmalloc(sizeof(struct simple_children), + GFP_KERNEL); + if (!simple_children) + return NULL; + + memset(simple_children, 0, sizeof(struct simple_children)); + + config_group_init_type_name(&simple_children->group, name, + &simple_children_type); + + return &simple_children->group; +} + +static struct configfs_attribute group_children_attr_description = { + .ca_owner = THIS_MODULE, + .ca_name = "description", + .ca_mode = S_IRUGO, +}; + +static struct configfs_attribute *group_children_attrs[] = { + &group_children_attr_description, + NULL, +}; + +static ssize_t group_children_attr_show(struct config_item *item, + struct configfs_attribute *attr, + char *page) +{ + return sprintf(page, +"[03-group-children]\n" +"\n" +"This subsystem allows the creation of child config_groups. These\n" +"groups are like the subsystem simple-children.\n"); +} + +static struct configfs_item_operations group_children_item_ops = { + .show_attribute = group_children_attr_show, +}; + +/* + * Note that, since no extra work is required on ->drop_item(), + * no ->drop_item() is provided. + */ +static struct configfs_group_operations group_children_group_ops = { + .make_group = group_children_make_group, +}; + +static struct config_item_type group_children_type = { + .ct_item_ops = &group_children_item_ops, + .ct_group_ops = &group_children_group_ops, + .ct_attrs = group_children_attrs, +}; + +static struct configfs_subsystem group_children_subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "03-group-children", + .ci_type = &group_children_type, + }, + }, +}; + +/* ----------------------------------------------------------------- */ + +/* + * We're now done with our subsystem definitions. + * For convenience in this module, here's a list of them all. It + * allows the init function to easily register them. Most modules + * will only have one subsystem, and will only call register_subsystem + * on it directly. + */ +static struct configfs_subsystem *example_subsys[] = { + &childless_subsys.subsys, + &simple_children_subsys, + &group_children_subsys, + NULL, +}; + +static int __init configfs_example_init(void) +{ + int ret; + int i; + struct configfs_subsystem *subsys; + + for (i = 0; example_subsys[i]; i++) { + subsys = example_subsys[i]; + + config_group_init(&subsys->su_group); + init_MUTEX(&subsys->su_sem); + ret = configfs_register_subsystem(subsys); + if (ret) { + printk(KERN_ERR "Error %d while registering subsystem %s\n", + ret, + subsys->su_group.cg_item.ci_namebuf); + goto out_unregister; + } + } + + return 0; + +out_unregister: + for (; i >= 0; i--) { + configfs_unregister_subsystem(example_subsys[i]); + } + + return ret; +} + +static void __exit configfs_example_exit(void) +{ + int i; + + for (i = 0; example_subsys[i]; i++) { + configfs_unregister_subsystem(example_subsys[i]); + } +} + +module_init(configfs_example_init); +module_exit(configfs_example_exit); +MODULE_LICENSE("GPL"); diff --git a/Documentation/filesystems/dlmfs.txt b/Documentation/filesystems/dlmfs.txt new file mode 100644 index 000000000000..9afab845a906 --- /dev/null +++ b/Documentation/filesystems/dlmfs.txt @@ -0,0 +1,130 @@ +dlmfs +================== +A minimal DLM userspace interface implemented via a virtual file +system. + +dlmfs is built with OCFS2 as it requires most of its infrastructure. + +Project web page: http://oss.oracle.com/projects/ocfs2 +Tools web page: http://oss.oracle.com/projects/ocfs2-tools +OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ + +All code copyright 2005 Oracle except when otherwise noted. + +CREDITS +======= + +Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds +and Transmeta Corp. + +Mark Fasheh <mark.fasheh@oracle.com> + +Caveats +======= +- Right now it only works with the OCFS2 DLM, though support for other + DLM implementations should not be a major issue. + +Mount options +============= +None + +Usage +===== + +If you're just interested in OCFS2, then please see ocfs2.txt. The +rest of this document will be geared towards those who want to use +dlmfs for easy to setup and easy to use clustered locking in +userspace. + +Setup +===== + +dlmfs requires that the OCFS2 cluster infrastructure be in +place. Please download ocfs2-tools from the above url and configure a +cluster. + +You'll want to start heartbeating on a volume which all the nodes in +your lockspace can access. The easiest way to do this is via +ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires +that an OCFS2 file system be in place so that it can automatically +find it's heartbeat area, though it will eventually support heartbeat +against raw disks. + +Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed +with ocfs2-tools. + +Once you're heartbeating, DLM lock 'domains' can be easily created / +destroyed and locks within them accessed. + +Locking +======= + +Users may access dlmfs via standard file system calls, or they can use +'libo2dlm' (distributed with ocfs2-tools) which abstracts the file +system calls and presents a more traditional locking api. + +dlmfs handles lock caching automatically for the user, so a lock +request for an already acquired lock will not generate another DLM +call. Userspace programs are assumed to handle their own local +locking. + +Two levels of locks are supported - Shared Read, and Exlcusive. +Also supported is a Trylock operation. + +For information on the libo2dlm interface, please see o2dlm.h, +distributed with ocfs2-tools. + +Lock value blocks can be read and written to a resource via read(2) +and write(2) against the fd obtained via your open(2) call. The +maximum currently supported LVB length is 64 bytes (though that is an +OCFS2 DLM limitation). Through this mechanism, users of dlmfs can share +small amounts of data amongst their nodes. + +mkdir(2) signals dlmfs to join a domain (which will have the same name +as the resulting directory) + +rmdir(2) signals dlmfs to leave the domain + +Locks for a given domain are represented by regular inodes inside the +domain directory. Locking against them is done via the open(2) system +call. + +The open(2) call will not return until your lock has been granted or +an error has occurred, unless it has been instructed to do a trylock +operation. If the lock succeeds, you'll get an fd. + +open(2) with O_CREAT to ensure the resource inode is created - dlmfs does +not automatically create inodes for existing lock resources. + +Open Flag Lock Request Type +--------- ----------------- +O_RDONLY Shared Read +O_RDWR Exclusive + +Open Flag Resulting Locking Behavior +--------- -------------------------- +O_NONBLOCK Trylock operation + +You must provide exactly one of O_RDONLY or O_RDWR. + +If O_NONBLOCK is also provided and the trylock operation was valid but +could not lock the resource then open(2) will return ETXTBUSY. + +close(2) drops the lock associated with your fd. + +Modes passed to mkdir(2) or open(2) are adhered to locally. Chown is +supported locally as well. This means you can use them to restrict +access to the resources via dlmfs on your local node only. + +The resource LVB may be read from the fd in either Shared Read or +Exclusive modes via the read(2) system call. It can be written via +write(2) only when open in Exclusive mode. + +Once written, an LVB will be visible to other nodes who obtain Read +Only or higher level locks on the resource. + +See Also +======== +http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf + +For more information on the VMS distributed locking API. diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt new file mode 100644 index 000000000000..f2595caf052e --- /dev/null +++ b/Documentation/filesystems/ocfs2.txt @@ -0,0 +1,55 @@ +OCFS2 filesystem +================== +OCFS2 is a general purpose extent based shared disk cluster file +system with many similarities to ext3. It supports 64 bit inode +numbers, and has automatically extending metadata groups which may +also make it attractive for non-clustered use. + +You'll want to install the ocfs2-tools package in order to at least +get "mount.ocfs2" and "ocfs2_hb_ctl". + +Project web page: http://oss.oracle.com/projects/ocfs2 +Tools web page: http://oss.oracle.com/projects/ocfs2-tools +OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ + +All code copyright 2005 Oracle except when otherwise noted. + +CREDITS: +Lots of code taken from ext3 and other projects. + +Authors in alphabetical order: +Joel Becker <joel.becker@oracle.com> +Zach Brown <zach.brown@oracle.com> +Mark Fasheh <mark.fasheh@oracle.com> +Kurt Hackel <kurt.hackel@oracle.com> +Sunil Mushran <sunil.mushran@oracle.com> +Manish Singh <manish.singh@oracle.com> + +Caveats +======= +Features which OCFS2 does not support yet: + - sparse files + - extended attributes + - shared writeable mmap + - loopback is supported, but data written will not + be cluster coherent. + - quotas + - cluster aware flock + - Directory change notification (F_NOTIFY) + - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease) + - POSIX ACLs + - readpages / writepages (not user visible) + +Mount options +============= + +OCFS2 supports the following mount options: +(*) == default + +barrier=1 This enables/disables barriers. barrier=0 disables it, + barrier=1 enables it. +errors=remount-ro(*) Remount the filesystem read-only on an error. +errors=panic Panic and halt the machine if an error occurs. +intr (*) Allow signals to interrupt cluster operations. +nointr Do not allow signals to interrupt cluster + operations. diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 31154882000a..6304db59bfe4 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -860,24 +860,6 @@ The structure has a number of fields, some of which are mandatory: It is safe to sleep in this method. - (*) int (*duplicate)(struct key *key, const struct key *source); - - If this type of key can be duplicated, then this method should be - provided. It is called to copy the payload attached to the source into the - new key. The data length on the new key will have been updated and the - quota adjusted already. - - This method will be called with the source key's semaphore read-locked to - prevent its payload from being changed, thus RCU constraints need not be - applied to the source key. - - This method does not have to lock the destination key in order to attach a - payload. The fact that KEY_FLAG_INSTANTIATED is not set in key->flags - prevents anything else from gaining access to the key. - - It is safe to sleep in this method. - - (*) int (*update)(struct key *key, const void *data, size_t datalen); If this type of key can be updated, then this method should be provided. diff --git a/Documentation/md.txt b/Documentation/md.txt index 23e6cce40f9c..03a13c462cf2 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -51,6 +51,30 @@ superblock can be autodetected and run at boot time. The kernel parameter "raid=partitionable" (or "raid=part") means that all auto-detected arrays are assembled as partitionable. +Boot time assembly of degraded/dirty arrays +------------------------------------------- + +If a raid5 or raid6 array is both dirty and degraded, it could have +undetectable data corruption. This is because the fact that it is +'dirty' means that the parity cannot be trusted, and the fact that it +is degraded means that some datablocks are missing and cannot reliably +be reconstructed (due to no parity). + +For this reason, md will normally refuse to start such an array. This +requires the sysadmin to take action to explicitly start the array +desipite possible corruption. This is normally done with + mdadm --assemble --force .... + +This option is not really available if the array has the root +filesystem on it. In order to support this booting from such an +array, md supports a module parameter "start_dirty_degraded" which, +when set to 1, bypassed the checks and will allows dirty degraded +arrays to be started. + +So, to boot with a root filesystem of a dirty degraded raid[56], use + + md-mod.start_dirty_degraded=1 + Superblock formats ------------------ @@ -141,6 +165,70 @@ All md devices contain: in a fully functional array. If this is not yet known, the file will be empty. If an array is being resized (not currently possible) this will contain the larger of the old and new sizes. + Some raid level (RAID1) allow this value to be set while the + array is active. This will reconfigure the array. Otherwise + it can only be set while assembling an array. + + chunk_size + This is the size if bytes for 'chunks' and is only relevant to + raid levels that involve striping (1,4,5,6,10). The address space + of the array is conceptually divided into chunks and consecutive + chunks are striped onto neighbouring devices. + The size should be atleast PAGE_SIZE (4k) and should be a power + of 2. This can only be set while assembling an array + + component_size + For arrays with data redundancy (i.e. not raid0, linear, faulty, + multipath), all components must be the same size - or at least + there must a size that they all provide space for. This is a key + part or the geometry of the array. It is measured in sectors + and can be read from here. Writing to this value may resize + the array if the personality supports it (raid1, raid5, raid6), + and if the component drives are large enough. + + metadata_version + This indicates the format that is being used to record metadata + about the array. It can be 0.90 (traditional format), 1.0, 1.1, + 1.2 (newer format in varying locations) or "none" indicating that + the kernel isn't managing metadata at all. + + level + The raid 'level' for this array. The name will often (but not + always) be the same as the name of the module that implements the + level. To be auto-loaded the module must have an alias + md-$LEVEL e.g. md-raid5 + This can be written only while the array is being assembled, not + after it is started. + + new_dev + This file can be written but not read. The value written should + be a block device number as major:minor. e.g. 8:0 + This will cause that device to be attached to the array, if it is + available. It will then appear at md/dev-XXX (depending on the + name of the device) and further configuration is then possible. + + sync_speed_min + sync_speed_max + This are similar to /proc/sys/dev/raid/speed_limit_{min,max} + however they only apply to the particular array. + If no value has been written to these, of if the word 'system' + is written, then the system-wide value is used. If a value, + in kibibytes-per-second is written, then it is used. + When the files are read, they show the currently active value + followed by "(local)" or "(system)" depending on whether it is + a locally set or system-wide value. + + sync_completed + This shows the number of sectors that have been completed of + whatever the current sync_action is, followed by the number of + sectors in total that could need to be processed. The two + numbers are separated by a '/' thus effectively showing one + value, a fraction of the process that is complete. + + sync_speed + This shows the current actual speed, in K/sec, of the current + sync_action. It is averaged over the last 30 seconds. + As component devices are added to an md array, they appear in the 'md' directory as new directories named @@ -167,6 +255,38 @@ Each directory contains: of being recoverred to This list make grow in future. + errors + An approximate count of read errors that have been detected on + this device but have not caused the device to be evicted from + the array (either because they were corrected or because they + happened while the array was read-only). When using version-1 + metadata, this value persists across restarts of the array. + + This value can be written while assembling an array thus + providing an ongoing count for arrays with metadata managed by + userspace. + + slot + This gives the role that the device has in the array. It will + either be 'none' if the device is not active in the array + (i.e. is a spare or has failed) or an integer less than the + 'raid_disks' number for the array indicating which possition + it currently fills. This can only be set while assembling an + array. A device for which this is set is assumed to be working. + + offset + This gives the location in the device (in sectors from the + start) where data from the array will be stored. Any part of + the device before this offset us not touched, unless it is + used for storing metadata (Formats 1.1 and 1.2). + + size + The amount of the device, after the offset, that can be used + for storage of data. This will normally be the same as the + component_size. This can be written while assembling an + array. If a value less than the current component_size is + written, component_size will be reduced to this value. + An active md device will also contain and entry for each active device in the array. These are named diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt index f5ebda5f4276..bd4ffb5bd49a 100644 --- a/Documentation/power/interface.txt +++ b/Documentation/power/interface.txt @@ -41,3 +41,14 @@ to. Writing to this file will accept one of It will only change to 'firmware' or 'platform' if the system supports it. +/sys/power/image_size controls the size of the image created by +the suspend-to-disk mechanism. It can be written a string +representing a non-negative integer that will be used as an upper +limit of the image size, in megabytes. The suspend-to-disk mechanism will +do its best to ensure the image size will not exceed that number. However, +if this turns out to be impossible, it will try to suspend anyway using the +smallest image possible. In particular, if "0" is written to this file, the +suspend image will be as small as possible. + +Reading from this file will display the current image size limit, which +is set to 500 MB by default. diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index b0d50840788e..cd0fcd89a6f0 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt @@ -27,6 +27,11 @@ echo shutdown > /sys/power/disk; echo disk > /sys/power/state echo platform > /sys/power/disk; echo disk > /sys/power/state +If you want to limit the suspend image size to N megabytes, do + +echo N > /sys/power/image_size + +before suspend (it is limited to 500 MB by default). Encrypted suspend image: ------------------------ |