diff options
author | Dmitry Torokhov <dtor@insightbb.com> | 2006-09-19 01:56:44 -0400 |
---|---|---|
committer | Dmitry Torokhov <dtor@insightbb.com> | 2006-09-19 01:56:44 -0400 |
commit | 0612ec48762bf8712db1925b2e67246d2237ebab (patch) | |
tree | 01b0d69c9c9915015c0f23ad4263646dd5413e99 /Documentation | |
parent | 4263cf0fac28122c8381b6f4f9441a43cd93c81f (diff) | |
parent | 47a5c6fa0e204a2b63309c648bb2fde36836c826 (diff) | |
download | linux-0612ec48762bf8712db1925b2e67246d2237ebab.tar.gz linux-0612ec48762bf8712db1925b2e67246d2237ebab.tar.bz2 linux-0612ec48762bf8712db1925b2e67246d2237ebab.zip |
Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Diffstat (limited to 'Documentation')
46 files changed, 1997 insertions, 832 deletions
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index 7c717699032c..63392c9132b4 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt @@ -698,12 +698,12 @@ these interfaces. Remember that, as defined, consistent mappings are always going to be SAC addressable. The first thing your driver needs to do is query the PCI platform -layer with your devices DAC addressing capabilities: +layer if it is capable of handling your devices DAC addressing +capabilities: - int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask); + int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask); -This routine behaves identically to pci_set_dma_mask. You may not -use the following interfaces if this routine fails. +You may not use the following interfaces if this routine fails. Next, DMA addresses using this API are kept track of using the dma64_addr_t type. It is guaranteed to be big enough to hold any diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 1ae4dc0fd856..f8fe882e33dc 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -59,6 +59,9 @@ !Iinclude/linux/hrtimer.h !Ekernel/hrtimer.c </sect1> + <sect1><title>Workqueues and Kevents</title> +!Ekernel/workqueue.c + </sect1> <sect1><title>Internal Functions</title> !Ikernel/exit.c !Ikernel/signal.c @@ -300,7 +303,7 @@ X!Ekernel/module.c </sect1> <sect1><title>Resources Management</title> -!Ekernel/resource.c +!Ikernel/resource.c </sect1> <sect1><title>MTRR Handling</title> @@ -312,9 +315,7 @@ X!Ekernel/module.c !Edrivers/pci/pci-driver.c !Edrivers/pci/remove.c !Edrivers/pci/pci-acpi.c -<!-- kerneldoc does not understand __devinit -X!Edrivers/pci/search.c - --> +!Edrivers/pci/search.c !Edrivers/pci/msi.c !Edrivers/pci/bus.c <!-- FIXME: Removed for now since no structured comments in source diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 4f41a60e5111..318df44259b3 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -687,8 +687,9 @@ diff shows how closely related RCU and reader-writer locking can be. + spin_lock(&listmutex); list_for_each_entry(p, head, lp) { if (p->key == key) { - list_del(&p->list); + - list_del(&p->list); - write_unlock(&listmutex); + + list_del_rcu(&p->list); + spin_unlock(&listmutex); + synchronize_rcu(); kfree(p); @@ -736,7 +737,7 @@ Or, for those who prefer a side-by-side listing: 5 write_lock(&listmutex); 5 spin_lock(&listmutex); 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { 7 if (p->key == key) { 7 if (p->key == key) { - 8 list_del(&p->list); 8 list_del(&p->list); + 8 list_del(&p->list); 8 list_del_rcu(&p->list); 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); 10 synchronize_rcu(); 10 kfree(p); 11 kfree(p); diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist index 8230098da529..a10bfb6ecd9f 100644 --- a/Documentation/SubmitChecklist +++ b/Documentation/SubmitChecklist @@ -1,57 +1,63 @@ Linux Kernel patch sumbittal checklist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Here are some basic things that developers should do if they -want to see their kernel patch submittals accepted quicker. +Here are some basic things that developers should do if they want to see their +kernel patch submissions accepted more quickly. -These are all above and beyond the documentation that is provided -in Documentation/SubmittingPatches and elsewhere about submitting -Linux kernel patches. +These are all above and beyond the documentation that is provided in +Documentation/SubmittingPatches and elsewhere regarding submitting Linux +kernel patches. -- Builds cleanly with applicable or modified CONFIG options =y, =m, and =n. - No gcc warnings/errors, no linker warnings/errors. +1: Builds cleanly with applicable or modified CONFIG options =y, =m, and + =n. No gcc warnings/errors, no linker warnings/errors. -- Passes allnoconfig, allmodconfig +2: Passes allnoconfig, allmodconfig -- Builds on multiple CPU arch-es by using local cross-compile tools - or something like PLM at OSDL. +3: Builds on multiple CPU architectures by using local cross-compile tools + or something like PLM at OSDL. -- ppc64 is a good architecture for cross-compilation checking because it - tends to use `unsigned long' for 64-bit quantities. +4: ppc64 is a good architecture for cross-compilation checking because it + tends to use `unsigned long' for 64-bit quantities. -- Matches kernel coding style(!) +5: Matches kernel coding style(!) -- Any new or modified CONFIG options don't muck up the config menu. +6: Any new or modified CONFIG options don't muck up the config menu. -- All new Kconfig options have help text. +7: All new Kconfig options have help text. -- Has been carefully reviewed with respect to relevant Kconfig - combinations. This is very hard to get right with testing -- - brainpower pays off here. +8: Has been carefully reviewed with respect to relevant Kconfig + combinations. This is very hard to get right with testing -- brainpower + pays off here. -- Check cleanly with sparse. +9: Check cleanly with sparse. -- Use 'make checkstack' and 'make namespacecheck' and fix any - problems that they find. Note: checkstack does not point out - problems explicitly, but any one function that uses more than - 512 bytes on the stack is a candidate for change. +10: Use 'make checkstack' and 'make namespacecheck' and fix any problems + that they find. Note: checkstack does not point out problems explicitly, + but any one function that uses more than 512 bytes on the stack is a + candidate for change. -- Include kernel-doc to document global kernel APIs. (Not required - for static functions, but OK there also.) Use 'make htmldocs' - or 'make mandocs' to check the kernel-doc and fix any issues. +11: Include kernel-doc to document global kernel APIs. (Not required for + static functions, but OK there also.) Use 'make htmldocs' or 'make + mandocs' to check the kernel-doc and fix any issues. -- Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, - CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, - CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously - enabled. +12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, + CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, + CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously + enabled. -- Has been build- and runtime tested with and without CONFIG_SMP and - CONFIG_PREEMPT. +13: Has been build- and runtime tested with and without CONFIG_SMP and + CONFIG_PREEMPT. -- If the patch affects IO/Disk, etc: has been tested with and without - CONFIG_LBD. +14: If the patch affects IO/Disk, etc: has been tested with and without + CONFIG_LBD. +15: All codepaths have been exercised with all lockdep features enabled. -2006-APR-27 +16: All new /proc entries are documented under Documentation/ + +17: All new kernel boot parameters are documented in + Documentation/kernel-parameters.txt. + +18: All new module parameters are documented with MODULE_PARM_DESC() diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index c2c85bcb3d43..d42ab4c9e893 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -10,7 +10,9 @@ kernel, the process can sometimes be daunting if you're not familiar with "the system." This text is a collection of suggestions which can greatly increase the chances of your change being accepted. -If you are submitting a driver, also read Documentation/SubmittingDrivers. +Read Documentation/SubmitChecklist for a list of items to check +before submitting code. If you are submitting a driver, also read +Documentation/SubmittingDrivers. @@ -74,9 +76,6 @@ There are a number of scripts which can aid in this: Quilt: http://savannah.nongnu.org/projects/quilt -Randy Dunlap's patch scripts: -http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz - Andrew Morton's patch scripts: http://www.zip.com.au/~akpm/linux/patches/ Instead of these scripts, quilt is the recommended patch management @@ -309,6 +308,8 @@ then you just add a line saying Signed-off-by: Random J Developer <random@developer.example.org> +using your real name (sorry, no pseudonyms or anonymous contributions.) + Some people also put extra tags at the end. They'll just be ignored for now, but you can do this to mark internal company procedures or just point out some special detail about the sign-off. @@ -484,7 +485,7 @@ Greg Kroah-Hartman "How to piss off a kernel subsystem maintainer". <http://www.kroah.com/log/2005/10/19/> <http://www.kroah.com/log/2006/01/11/> -NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!. +NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people! <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2> Kernel Documentation/CodingStyle @@ -493,4 +494,3 @@ Kernel Documentation/CodingStyle Linus Torvald's mail on the canonical patch format: <http://lkml.org/lkml/2005/4/7/183> -- -Last updated on 17 Nov 2005. diff --git a/Documentation/accounting/delay-accounting.txt b/Documentation/accounting/delay-accounting.txt new file mode 100644 index 000000000000..1443cd71d263 --- /dev/null +++ b/Documentation/accounting/delay-accounting.txt @@ -0,0 +1,112 @@ +Delay accounting +---------------- + +Tasks encounter delays in execution when they wait +for some kernel resource to become available e.g. a +runnable task may wait for a free CPU to run on. + +The per-task delay accounting functionality measures +the delays experienced by a task while + +a) waiting for a CPU (while being runnable) +b) completion of synchronous block I/O initiated by the task +c) swapping in pages + +and makes these statistics available to userspace through +the taskstats interface. + +Such delays provide feedback for setting a task's cpu priority, +io priority and rss limit values appropriately. Long delays for +important tasks could be a trigger for raising its corresponding priority. + +The functionality, through its use of the taskstats interface, also provides +delay statistics aggregated for all tasks (or threads) belonging to a +thread group (corresponding to a traditional Unix process). This is a commonly +needed aggregation that is more efficiently done by the kernel. + +Userspace utilities, particularly resource management applications, can also +aggregate delay statistics into arbitrary groups. To enable this, delay +statistics of a task are available both during its lifetime as well as on its +exit, ensuring continuous and complete monitoring can be done. + + +Interface +--------- + +Delay accounting uses the taskstats interface which is described +in detail in a separate document in this directory. Taskstats returns a +generic data structure to userspace corresponding to per-pid and per-tgid +statistics. The delay accounting functionality populates specific fields of +this structure. See + include/linux/taskstats.h +for a description of the fields pertaining to delay accounting. +It will generally be in the form of counters returning the cumulative +delay seen for cpu, sync block I/O, swapin etc. + +Taking the difference of two successive readings of a given +counter (say cpu_delay_total) for a task will give the delay +experienced by the task waiting for the corresponding resource +in that interval. + +When a task exits, records containing the per-task statistics +are sent to userspace without requiring a command. If it is the last exiting +task of a thread group, the per-tgid statistics are also sent. More details +are given in the taskstats interface description. + +The getdelays.c userspace utility in this directory allows simple commands to +be run and the corresponding delay statistics to be displayed. It also serves +as an example of using the taskstats interface. + +Usage +----- + +Compile the kernel with + CONFIG_TASK_DELAY_ACCT=y + CONFIG_TASKSTATS=y + +Delay accounting is enabled by default at boot up. +To disable, add + nodelayacct +to the kernel boot options. The rest of the instructions +below assume this has not been done. + +After the system has booted up, use a utility +similar to getdelays.c to access the delays +seen by a given task or a task group (tgid). +The utility also allows a given command to be +executed and the corresponding delays to be +seen. + +General format of the getdelays command + +getdelays [-t tgid] [-p pid] [-c cmd...] + + +Get delays, since system boot, for pid 10 +# ./getdelays -p 10 +(output similar to next case) + +Get sum of delays, since system boot, for all pids with tgid 5 +# ./getdelays -t 5 + + +CPU count real total virtual total delay total + 7876 92005750 100000000 24001500 +IO count delay total + 0 0 +MEM count delay total + 0 0 + +Get delays seen in executing a given simple command +# ./getdelays -c ls / + +bin data1 data3 data5 dev home media opt root srv sys usr +boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var + + +CPU count real total virtual total delay total + 6 4000250 4000000 0 +IO count delay total + 0 0 +MEM count delay total + 0 0 diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c new file mode 100644 index 000000000000..795ca3911cc5 --- /dev/null +++ b/Documentation/accounting/getdelays.c @@ -0,0 +1,396 @@ +/* getdelays.c + * + * Utility to get per-pid and per-tgid delay accounting statistics + * Also illustrates usage of the taskstats interface + * + * Copyright (C) Shailabh Nagar, IBM Corp. 2005 + * Copyright (C) Balbir Singh, IBM Corp. 2006 + * Copyright (c) Jay Lan, SGI. 2006 + * + */ + +#include <stdio.h> +#include <stdlib.h> +#include <errno.h> +#include <unistd.h> +#include <poll.h> +#include <string.h> +#include <fcntl.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <sys/socket.h> +#include <sys/types.h> +#include <signal.h> + +#include <linux/genetlink.h> +#include <linux/taskstats.h> + +/* + * Generic macros for dealing with netlink sockets. Might be duplicated + * elsewhere. It is recommended that commercial grade applications use + * libnl or libnetlink and use the interfaces provided by the library + */ +#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN)) +#define GENLMSG_PAYLOAD(glh) (NLMSG_PAYLOAD(glh, 0) - GENL_HDRLEN) +#define NLA_DATA(na) ((void *)((char*)(na) + NLA_HDRLEN)) +#define NLA_PAYLOAD(len) (len - NLA_HDRLEN) + +#define err(code, fmt, arg...) do { printf(fmt, ##arg); exit(code); } while (0) +int done = 0; +int rcvbufsz=0; + + char name[100]; +int dbg=0, print_delays=0; +__u64 stime, utime; +#define PRINTF(fmt, arg...) { \ + if (dbg) { \ + printf(fmt, ##arg); \ + } \ + } + +/* Maximum size of response requested or message sent */ +#define MAX_MSG_SIZE 256 +/* Maximum number of cpus expected to be specified in a cpumask */ +#define MAX_CPUS 32 +/* Maximum length of pathname to log file */ +#define MAX_FILENAME 256 + +struct msgtemplate { + struct nlmsghdr n; + struct genlmsghdr g; + char buf[MAX_MSG_SIZE]; +}; + +char cpumask[100+6*MAX_CPUS]; + +/* + * Create a raw netlink socket and bind + */ +static int create_nl_socket(int protocol) +{ + int fd; + struct sockaddr_nl local; + + fd = socket(AF_NETLINK, SOCK_RAW, protocol); + if (fd < 0) + return -1; + + if (rcvbufsz) + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, + &rcvbufsz, sizeof(rcvbufsz)) < 0) { + printf("Unable to set socket rcv buf size to %d\n", + rcvbufsz); + return -1; + } + + memset(&local, 0, sizeof(local)); + local.nl_family = AF_NETLINK; + + if (bind(fd, (struct sockaddr *) &local, sizeof(local)) < 0) + goto error; + + return fd; +error: + close(fd); + return -1; +} + + +int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid, + __u8 genl_cmd, __u16 nla_type, + void *nla_data, int nla_len) +{ + struct nlattr *na; + struct sockaddr_nl nladdr; + int r, buflen; + char *buf; + + struct msgtemplate msg; + + msg.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN); + msg.n.nlmsg_type = nlmsg_type; + msg.n.nlmsg_flags = NLM_F_REQUEST; + msg.n.nlmsg_seq = 0; + msg.n.nlmsg_pid = nlmsg_pid; + msg.g.cmd = genl_cmd; + msg.g.version = 0x1; + na = (struct nlattr *) GENLMSG_DATA(&msg); + na->nla_type = nla_type; + na->nla_len = nla_len + 1 + NLA_HDRLEN; + memcpy(NLA_DATA(na), nla_data, nla_len); + msg.n.nlmsg_len += NLMSG_ALIGN(na->nla_len); + + buf = (char *) &msg; + buflen = msg.n.nlmsg_len ; + memset(&nladdr, 0, sizeof(nladdr)); + nladdr.nl_family = AF_NETLINK; + while ((r = sendto(sd, buf, buflen, 0, (struct sockaddr *) &nladdr, + sizeof(nladdr))) < buflen) { + if (r > 0) { + buf += r; + buflen -= r; + } else if (errno != EAGAIN) + return -1; + } + return 0; +} + + +/* + * Probe the controller in genetlink to find the family id + * for the TASKSTATS family + */ +int get_family_id(int sd) +{ + struct { + struct nlmsghdr n; + struct genlmsghdr g; + char buf[256]; + } ans; + + int id, rc; + struct nlattr *na; + int rep_len; + + strcpy(name, TASKSTATS_GENL_NAME); + rc = send_cmd(sd, GENL_ID_CTRL, getpid(), CTRL_CMD_GETFAMILY, + CTRL_ATTR_FAMILY_NAME, (void *)name, + strlen(TASKSTATS_GENL_NAME)+1); + + rep_len = recv(sd, &ans, sizeof(ans), 0); + if (ans.n.nlmsg_type == NLMSG_ERROR || + (rep_len < 0) || !NLMSG_OK((&ans.n), rep_len)) + return 0; + + na = (struct nlattr *) GENLMSG_DATA(&ans); + na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len)); + if (na->nla_type == CTRL_ATTR_FAMILY_ID) { + id = *(__u16 *) NLA_DATA(na); + } + return id; +} + +void print_delayacct(struct taskstats *t) +{ + printf("\n\nCPU %15s%15s%15s%15s\n" + " %15llu%15llu%15llu%15llu\n" + "IO %15s%15s\n" + " %15llu%15llu\n" + "MEM %15s%15s\n" + " %15llu%15llu\n\n", + "count", "real total", "virtual total", "delay total", + t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total, + t->cpu_delay_total, + "count", "delay total", + t->blkio_count, t->blkio_delay_total, + "count", "delay total", t->swapin_count, t->swapin_delay_total); +} + +int main(int argc, char *argv[]) +{ + int c, rc, rep_len, aggr_len, len2, cmd_type; + __u16 id; + __u32 mypid; + + struct nlattr *na; + int nl_sd = -1; + int len = 0; + pid_t tid = 0; + pid_t rtid = 0; + + int fd = 0; + int count = 0; + int write_file = 0; + int maskset = 0; + char logfile[128]; + int loop = 0; + + struct msgtemplate msg; + + while (1) { + c = getopt(argc, argv, "dw:r:m:t:p:v:l"); + if (c < 0) + break; + + switch (c) { + case 'd': + printf("print delayacct stats ON\n"); + print_delays = 1; + break; + case 'w': + strncpy(logfile, optarg, MAX_FILENAME); + printf("write to file %s\n", logfile); + write_file = 1; + break; + case 'r': + rcvbufsz = atoi(optarg); + printf("receive buf size %d\n", rcvbufsz); + if (rcvbufsz < 0) + err(1, "Invalid rcv buf size\n"); + break; + case 'm': + strncpy(cpumask, optarg, sizeof(cpumask)); + maskset = 1; + printf("cpumask %s maskset %d\n", cpumask, maskset); + break; + case 't': + tid = atoi(optarg); + if (!tid) + err(1, "Invalid tgid\n"); + cmd_type = TASKSTATS_CMD_ATTR_TGID; + print_delays = 1; + break; + case 'p': + tid = atoi(optarg); + if (!tid) + err(1, "Invalid pid\n"); + cmd_type = TASKSTATS_CMD_ATTR_PID; + print_delays = 1; + break; + case 'v': + printf("debug on\n"); + dbg = 1; + break; + case 'l': + printf("listen forever\n"); + loop = 1; + break; + default: + printf("Unknown option %d\n", c); + exit(-1); + } + } + + if (write_file) { + fd = open(logfile, O_WRONLY | O_CREAT | O_TRUNC, + S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); + if (fd == -1) { + perror("Cannot open output file\n"); + exit(1); + } + } + + if ((nl_sd = create_nl_socket(NETLINK_GENERIC)) < 0) + err(1, "error creating Netlink socket\n"); + + + mypid = getpid(); + id = get_family_id(nl_sd); + if (!id) { + printf("Error getting family id, errno %d", errno); + goto err; + } + PRINTF("family id %d\n", id); + + if (maskset) { + rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET, + TASKSTATS_CMD_ATTR_REGISTER_CPUMASK, + &cpumask, sizeof(cpumask)); + PRINTF("Sent register cpumask, retval %d\n", rc); + if (rc < 0) { + printf("error sending register cpumask\n"); + goto err; + } + } + + if (tid) { + rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET, + cmd_type, &tid, sizeof(__u32)); + PRINTF("Sent pid/tgid, retval %d\n", rc); + if (rc < 0) { + printf("error sending tid/tgid cmd\n"); + goto done; + } + } + + do { + int i; + + rep_len = recv(nl_sd, &msg, sizeof(msg), 0); + PRINTF("received %d bytes\n", rep_len); + + if (rep_len < 0) { + printf("nonfatal reply error: errno %d\n", errno); + continue; + } + if (msg.n.nlmsg_type == NLMSG_ERROR || + !NLMSG_OK((&msg.n), rep_len)) { + printf("fatal reply error, errno %d\n", errno); + goto done; + } + + PRINTF("nlmsghdr size=%d, nlmsg_len=%d, rep_len=%d\n", + sizeof(struct nlmsghdr), msg.n.nlmsg_len, rep_len); + + + rep_len = GENLMSG_PAYLOAD(&msg.n); + + na = (struct nlattr *) GENLMSG_DATA(&msg); + len = 0; + i = 0; + while (len < rep_len) { + len += NLA_ALIGN(na->nla_len); + switch (na->nla_type) { + case TASKSTATS_TYPE_AGGR_TGID: + /* Fall through */ + case TASKSTATS_TYPE_AGGR_PID: + aggr_len = NLA_PAYLOAD(na->nla_len); + len2 = 0; + /* For nested attributes, na follows */ + na = (struct nlattr *) NLA_DATA(na); + done = 0; + while (len2 < aggr_len) { + switch (na->nla_type) { + case TASKSTATS_TYPE_PID: + rtid = *(int *) NLA_DATA(na); + if (print_delays) + printf("PID\t%d\n", rtid); + break; + case TASKSTATS_TYPE_TGID: + rtid = *(int *) NLA_DATA(na); + if (print_delays) + printf("TGID\t%d\n", rtid); + break; + case TASKSTATS_TYPE_STATS: + count++; + if (print_delays) + print_delayacct((struct taskstats *) NLA_DATA(na)); + if (fd) { + if (write(fd, NLA_DATA(na), na->nla_len) < 0) { + err(1,"write error\n"); + } + } + if (!loop) + goto done; + break; + default: + printf("Unknown nested nla_type %d\n", na->nla_type); + break; + } + len2 += NLA_ALIGN(na->nla_len); + na = (struct nlattr *) ((char *) na + len2); + } + break; + + default: + printf("Unknown nla_type %d\n", na->nla_type); + break; + } + na = (struct nlattr *) (GENLMSG_DATA(&msg) + len); + } + } while (loop); +done: + if (maskset) { + rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET, + TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK, + &cpumask, sizeof(cpumask)); + printf("Sent deregister mask, retval %d\n", rc); + if (rc < 0) + err(rc, "error sending deregister cpumask\n"); + } +err: + close(nl_sd); + if (fd) + close(fd); + return 0; +} diff --git a/Documentation/accounting/taskstats.txt b/Documentation/accounting/taskstats.txt new file mode 100644 index 000000000000..92ebf29e9041 --- /dev/null +++ b/Documentation/accounting/taskstats.txt @@ -0,0 +1,181 @@ +Per-task statistics interface +----------------------------- + + +Taskstats is a netlink-based interface for sending per-task and +per-process statistics from the kernel to userspace. + +Taskstats was designed for the following benefits: + +- efficiently provide statistics during lifetime of a task and on its exit +- unified interface for multiple accounting subsystems +- extensibility for use by future accounting patches + +Terminology +----------- + +"pid", "tid" and "task" are used interchangeably and refer to the standard +Linux task defined by struct task_struct. per-pid stats are the same as +per-task stats. + +"tgid", "process" and "thread group" are used interchangeably and refer to the +tasks that share an mm_struct i.e. the traditional Unix process. Despite the +use of tgid, there is no special treatment for the task that is thread group +leader - a process is deemed alive as long as it has any task belonging to it. + +Usage +----- + +To get statistics during a task's lifetime, userspace opens a unicast netlink +socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid. +The response contains statistics for a task (if pid is specified) or the sum of +statistics for all tasks of the process (if tgid is specified). + +To obtain statistics for tasks which are exiting, the userspace listener +sends a register command and specifies a cpumask. Whenever a task exits on +one of the cpus in the cpumask, its per-pid statistics are sent to the +registered listener. Using cpumasks allows the data received by one listener +to be limited and assists in flow control over the netlink interface and is +explained in more detail below. + +If the exiting task is the last thread exiting its thread group, +an additional record containing the per-tgid stats is also sent to userspace. +The latter contains the sum of per-pid stats for all threads in the thread +group, both past and present. + +getdelays.c is a simple utility demonstrating usage of the taskstats interface +for reporting delay accounting statistics. Users can register cpumasks, +send commands and process responses, listen for per-tid/tgid exit data, +write the data received to a file and do basic flow control by increasing +receive buffer sizes. + +Interface +--------- + +The user-kernel interface is encapsulated in include/linux/taskstats.h + +To avoid this documentation becoming obsolete as the interface evolves, only +an outline of the current version is given. taskstats.h always overrides the +description here. + +struct taskstats is the common accounting structure for both per-pid and +per-tgid data. It is versioned and can be extended by each accounting subsystem +that is added to the kernel. The fields and their semantics are defined in the +taskstats.h file. + +The data exchanged between user and kernel space is a netlink message belonging +to the NETLINK_GENERIC family and using the netlink attributes interface. +The messages are in the format + + +----------+- - -+-------------+-------------------+ + | nlmsghdr | Pad | genlmsghdr | taskstats payload | + +----------+- - -+-------------+-------------------+ + + +The taskstats payload is one of the following three kinds: + +1. Commands: Sent from user to kernel. Commands to get data on +a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID, +containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes +the task/process for which userspace wants statistics. + +Commands to register/deregister interest in exit data from a set of cpus +consist of one attribute, of type +TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the +attribute payload. The cpumask is specified as an ascii string of +comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8 +the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest +in cpus before closing the listening socket, the kernel cleans up its interest +set over time. However, for the sake of efficiency, an explicit deregistration +is advisable. + +2. Response for a command: sent from the kernel in response to a userspace +command. The payload is a series of three attributes of type: + +a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates +a pid/tgid will be followed by some stats. + +b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats +is being returned. + +c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The +same structure is used for both per-pid and per-tgid stats. + +3. New message sent by kernel whenever a task exits. The payload consists of a + series of attributes of the following type: + +a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats +b) TASKSTATS_TYPE_PID: contains exiting task's pid +c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats +d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats +e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs +f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process + + +per-tgid stats +-------------- + +Taskstats provides per-process stats, in addition to per-task stats, since +resource management is often done at a process granularity and aggregating task +stats in userspace alone is inefficient and potentially inaccurate (due to lack +of atomicity). + +However, maintaining per-process, in addition to per-task stats, within the +kernel has space and time overheads. To address this, the taskstats code +accumalates each exiting task's statistics into a process-wide data structure. +When the last task of a process exits, the process level data accumalated also +gets sent to userspace (along with the per-task data). + +When a user queries to get per-tgid data, the sum of all other live threads in +the group is added up and added to the accumalated total for previously exited +threads of the same thread group. + +Extending taskstats +------------------- + +There are two ways to extend the taskstats interface to export more +per-task/process stats as patches to collect them get added to the kernel +in future: + +1. Adding more fields to the end of the existing struct taskstats. Backward + compatibility is ensured by the version number within the + structure. Userspace will use only the fields of the struct that correspond + to the version its using. + +2. Defining separate statistic structs and using the netlink attributes + interface to return them. Since userspace processes each netlink attribute + independently, it can always ignore attributes whose type it does not + understand (because it is using an older version of the interface). + + +Choosing between 1. and 2. is a matter of trading off flexibility and +overhead. If only a few fields need to be added, then 1. is the preferable +path since the kernel and userspace don't need to incur the overhead of +processing new netlink attributes. But if the new fields expand the existing +struct too much, requiring disparate userspace accounting utilities to +unnecessarily receive large structures whose fields are of no interest, then +extending the attributes structure would be worthwhile. + +Flow control for taskstats +-------------------------- + +When the rate of task exits becomes large, a listener may not be able to keep +up with the kernel's rate of sending per-tid/tgid exit data leading to data +loss. This possibility gets compounded when the taskstats structure gets +extended and the number of cpus grows large. + +To avoid losing statistics, userspace should do one or more of the following: + +- increase the receive buffer sizes for the netlink sockets opened by +listeners to receive exit data. + +- create more listeners and reduce the number of cpus being listened to by +each listener. In the extreme case, there could be one listener for each cpu. +Users may also consider setting the cpu affinity of the listener to the subset +of cpus to which it listens, especially if they are listening to just one cpu. + +Despite these measures, if the userspace receives ENOBUFS error messages +indicated overflow of receive buffers, it should take measures to handle the +loss of data. + +---- diff --git a/Documentation/cciss.txt b/Documentation/cciss.txt index 15378422fc46..9c629ffa0e58 100644 --- a/Documentation/cciss.txt +++ b/Documentation/cciss.txt @@ -20,6 +20,7 @@ This driver is known to work with the following cards: * SA P400i * SA E200 * SA E200i + * SA E500 If nodes are not already created in the /dev/cciss directory, run as root: diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c new file mode 100644 index 000000000000..d738cde2a8d5 --- /dev/null +++ b/Documentation/connector/ucon.c @@ -0,0 +1,206 @@ +/* + * ucon.c + * + * Copyright (c) 2004+ Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <asm/types.h> + +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/poll.h> + +#include <linux/netlink.h> +#include <linux/rtnetlink.h> + +#include <arpa/inet.h> + +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <string.h> +#include <errno.h> +#include <time.h> + +#include <linux/connector.h> + +#define DEBUG +#define NETLINK_CONNECTOR 11 + +#ifdef DEBUG +#define ulog(f, a...) fprintf(stdout, f, ##a) +#else +#define ulog(f, a...) do {} while (0) +#endif + +static int need_exit; +static __u32 seq; + +static int netlink_send(int s, struct cn_msg *msg) +{ + struct nlmsghdr *nlh; + unsigned int size; + int err; + char buf[128]; + struct cn_msg *m; + + size = NLMSG_SPACE(sizeof(struct cn_msg) + msg->len); + + nlh = (struct nlmsghdr *)buf; + nlh->nlmsg_seq = seq++; + nlh->nlmsg_pid = getpid(); + nlh->nlmsg_type = NLMSG_DONE; + nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh)); + nlh->nlmsg_flags = 0; + + m = NLMSG_DATA(nlh); +#if 0 + ulog("%s: [%08x.%08x] len=%u, seq=%u, ack=%u.\n", + __func__, msg->id.idx, msg->id.val, msg->len, msg->seq, msg->ack); +#endif + memcpy(m, msg, sizeof(*m) + msg->len); + + err = send(s, nlh, size, 0); + if (err == -1) + ulog("Failed to send: %s [%d].\n", + strerror(errno), errno); + + return err; +} + +int main(int argc, char *argv[]) +{ + int s; + char buf[1024]; + int len; + struct nlmsghdr *reply; + struct sockaddr_nl l_local; + struct cn_msg *data; + FILE *out; + time_t tm; + struct pollfd pfd; + + if (argc < 2) + out = stdout; + else { + out = fopen(argv[1], "a+"); + if (!out) { + ulog("Unable to open %s for writing: %s\n", + argv[1], strerror(errno)); + out = stdout; + } + } + + memset(buf, 0, sizeof(buf)); + + s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR); + if (s == -1) { + perror("socket"); + return -1; + } + + l_local.nl_family = AF_NETLINK; + l_local.nl_groups = 0x123; /* bitmask of requested groups */ + l_local.nl_pid = 0; + + if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) { + perror("bind"); + close(s); + return -1; + } + +#if 0 + { + int on = 0x57; /* Additional group number */ + setsockopt(s, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &on, sizeof(on)); + } +#endif + if (0) { + int i, j; + + memset(buf, 0, sizeof(buf)); + + data = (struct cn_msg *)buf; + + data->id.idx = 0x123; + data->id.val = 0x456; + data->seq = seq++; + data->ack = 0; + data->len = 0; + + for (j=0; j<10; ++j) { + for (i=0; i<1000; ++i) { + len = netlink_send(s, data); + } + + ulog("%d messages have been sent to %08x.%08x.\n", i, data->id.idx, data->id.val); + } + + return 0; + } + + + pfd.fd = s; + + while (!need_exit) { + pfd.events = POLLIN; + pfd.revents = 0; + switch (poll(&pfd, 1, -1)) { + case 0: + need_exit = 1; + break; + case -1: + if (errno != EINTR) { + need_exit = 1; + break; + } + continue; + } + if (need_exit) + break; + + memset(buf, 0, sizeof(buf)); + len = recv(s, buf, sizeof(buf), 0); + if (len == -1) { + perror("recv buf"); + close(s); + return -1; + } + reply = (struct nlmsghdr *)buf; + + switch (reply->nlmsg_type) { + case NLMSG_ERROR: + fprintf(out, "Error message received.\n"); + fflush(out); + break; + case NLMSG_DONE: + data = (struct cn_msg *)NLMSG_DATA(reply); + + time(&tm); + fprintf(out, "%.24s : [%x.%x] [%08u.%08u].\n", + ctime(&tm), data->id.idx, data->id.val, data->seq, data->ack); + fflush(out); + break; + default: + break; + } + } + + close(s); + return 0; +} diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 7fedc00c3d30..555c8cf3650a 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt @@ -153,10 +153,13 @@ scaling_governor, and by "echoing" the name of another that some governors won't load - they only work on some specific architectures or processors. -scaling_min_freq and +scaling_min_freq and scaling_max_freq show the current "policy limits" (in kHz). By echoing new values into these files, you can change these limits. + NOTE: when setting a policy you need to + first set scaling_max_freq, then + scaling_min_freq. If you have selected the "userspace" governor which allows you to diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index 1bcf69996c9d..bc107cb157a8 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -251,16 +251,24 @@ A: This is what you would need in your kernel code to receive notifications. return NOTIFY_OK; } - static struct notifier_block foobar_cpu_notifer = + static struct notifier_block __cpuinitdata foobar_cpu_notifer = { .notifier_call = foobar_cpu_callback, }; +You need to call register_cpu_notifier() from your init function. +Init functions could be of two types: +1. early init (init function called when only the boot processor is online). +2. late init (init function called _after_ all the CPUs are online). -In your init function, +For the first case, you should add the following to your init function register_cpu_notifier(&foobar_cpu_notifier); +For the second case, you should add the following to your init function + + register_hotcpu_notifier(&foobar_cpu_notifier); + You can fail PREPARE notifiers if something doesn't work to prepare resources. This will stop the activity and send a following CANCELED event back. diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index 159e2a0c3e80..76b44290c154 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt @@ -217,6 +217,12 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs) to represent the cpuset hierarchy provides for a familiar permission and name space for cpusets, with a minimum of additional kernel code. +The cpus file in the root (top_cpuset) cpuset is read-only. +It automatically tracks the value of cpu_online_map, using a CPU +hotplug notifier. If and when memory nodes can be hotplugged, +we expect to make the mems file in the root cpuset read-only +as well, and have it track the value of node_online_map. + 1.4 What are exclusive cpusets ? -------------------------------- diff --git a/Documentation/devices.txt b/Documentation/devices.txt index 4aaf68fafebe..66c725f530f3 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt @@ -2565,10 +2565,10 @@ Your cooperation is appreciated. 243 = /dev/usb/dabusb3 Fourth dabusb device 180 block USB block devices - 0 = /dev/uba First USB block device - 8 = /dev/ubb Second USB block device - 16 = /dev/ubc Thrid USB block device - ... + 0 = /dev/uba First USB block device + 8 = /dev/ubb Second USB block device + 16 = /dev/ubc Third USB block device + ... 181 char Conrad Electronic parallel port radio clocks 0 = /dev/pcfclock0 First Conrad radio clock diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt index 70d96a62e5e1..7b3d969d2964 100644 --- a/Documentation/drivers/edac/edac.txt +++ b/Documentation/drivers/edac/edac.txt @@ -35,15 +35,14 @@ the vendor should tie the parity status bits to 0 if they do not intend to generate parity. Some vendors do not do this, and thus the parity bit can "float" giving false positives. -The PCI Parity EDAC device has the ability to "skip" known flaky -cards during the parity scan. These are set by the parity "blacklist" -interface in the sysfs for PCI Parity. (See the PCI section in the sysfs -section below.) There is also a parity "whitelist" which is used as -an explicit list of devices to scan, while the blacklist is a list -of devices to skip. +[There are patches in the kernel queue which will allow for storage of +quirks of PCI devices reporting false parity positives. The 2.6.18 +kernel should have those patches included. When that becomes available, +then EDAC will be patched to utilize that information to "skip" such +devices.] -EDAC will have future error detectors that will be added or integrated -into EDAC in the following list: +EDAC will have future error detectors that will be integrated with +EDAC or added to it, in the following list: MCE Machine Check Exception MCA Machine Check Architecture @@ -93,22 +92,24 @@ EDAC lives in the /sys/devices/system/edac directory. Within this directory there currently reside 2 'edac' components: mc memory controller(s) system - pci PCI status system + pci PCI control and status system ============================================================================ Memory Controller (mc) Model First a background on the memory controller's model abstracted in EDAC. -Each mc device controls a set of DIMM memory modules. These modules are +Each 'mc' device controls a set of DIMM memory modules. These modules are laid out in a Chip-Select Row (csrowX) and Channel table (chX). There can -be multiple csrows and two channels. +be multiple csrows and multiple channels. Memory controllers allow for several csrows, with 8 csrows being a typical value. Yet, the actual number of csrows depends on the electrical "loading" of a given motherboard, memory controller and DIMM characteristics. Dual channels allows for 128 bit data transfers to the CPU from memory. +Some newer chipsets allow for more than 2 channels, like Fully Buffered DIMMs +(FB-DIMMs). The following example will assume 2 channels: Channel 0 Channel 1 @@ -234,23 +235,15 @@ Polling period control file: The time period, in milliseconds, for polling for error information. Too small a value wastes resources. Too large a value might delay necessary handling of errors and might loose valuable information for - locating the error. 1000 milliseconds (once each second) is about - right for most uses. + locating the error. 1000 milliseconds (once each second) is the current + default. Systems which require all the bandwidth they can get, may + increase this. LOAD TIME: module/kernel parameter: poll_msec=[0|1] RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec -Module Version read-only attribute file: - - 'mc_version' - - The EDAC CORE module's version and compile date are shown here to - indicate what EDAC is running. - - - ============================================================================ 'mcX' DIRECTORIES @@ -284,35 +277,6 @@ Seconds since last counter reset control file: -DIMM capability attribute file: - - 'edac_capability' - - The EDAC (Error Detection and Correction) capabilities/modes of - the memory controller hardware. - - -DIMM Current Capability attribute file: - - 'edac_current_capability' - - The EDAC capabilities available with the hardware - configuration. This may not be the same as "EDAC capability" - if the correct memory is not used. If a memory controller is - capable of EDAC, but DIMMs without check bits are in use, then - Parity, SECDED, S4ECD4ED capabilities will not be available - even though the memory controller might be capable of those - modes with the proper memory loaded. - - -Memory Type supported on this controller attribute file: - - 'supported_mem_type' - - This attribute file displays the memory type, usually - buffered and unbuffered DIMMs. - - Memory Controller name attribute file: 'mc_name' @@ -321,16 +285,6 @@ Memory Controller name attribute file: that is being utilized. -Memory Controller Module name attribute file: - - 'module_name' - - This attribute file displays the memory controller module name, - version and date built. The name of the memory controller - hardware - some drivers work with multiple controllers and - this field shows which hardware is present. - - Total memory managed by this memory controller attribute file: 'size_mb' @@ -432,6 +386,9 @@ Memory Type attribute file: This attribute file will display what type of memory is currently on this csrow. Normally, either buffered or unbuffered memory. + Examples: + Registered-DDR + Unbuffered-DDR EDAC Mode of operation attribute file: @@ -446,8 +403,13 @@ Device type attribute file: 'dev_type' - This attribute file will display what type of DIMM device is - being utilized. Example: x4 + This attribute file will display what type of DRAM device is + being utilized on this DIMM. + Examples: + x1 + x2 + x4 + x8 Channel 0 CE Count attribute file: @@ -522,10 +484,10 @@ SYSTEM LOGGING If logging for UEs and CEs are enabled then system logs will have error notices indicating errors that have been detected: -MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, +EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, channel 1 "DIMM_B1": amd76x_edac -MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, +EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, channel 1 "DIMM_B1": amd76x_edac @@ -610,64 +572,4 @@ Parity Count: -PCI Device Whitelist: - - 'pci_parity_whitelist' - - This control file allows for an explicit list of PCI devices to be - scanned for parity errors. Only devices found on this list will - be examined. The list is a line of hexadecimal VENDOR and DEVICE - ID tuples: - - 1022:7450,1434:16a6 - - One or more can be inserted, separated by a comma. - - To write the above list doing the following as one command line: - - echo "1022:7450,1434:16a6" - > /sys/devices/system/edac/pci/pci_parity_whitelist - - - - To display what the whitelist is, simply 'cat' the same file. - - -PCI Device Blacklist: - - 'pci_parity_blacklist' - - This control file allows for a list of PCI devices to be - skipped for scanning. - The list is a line of hexadecimal VENDOR and DEVICE ID tuples: - - 1022:7450,1434:16a6 - - One or more can be inserted, separated by a comma. - - To write the above list doing the following as one command line: - - echo "1022:7450,1434:16a6" - > /sys/devices/system/edac/pci/pci_parity_blacklist - - - To display what the whitelist currently contains, - simply 'cat' the same file. - ======================================================================= - -PCI Vendor and Devices IDs can be obtained with the lspci command. Using -the -n option lspci will display the vendor and device IDs. The system -administrator will have to determine which devices should be scanned or -skipped. - - - -The two lists (white and black) are prioritized. blacklist is the lower -priority and will NOT be utilized when a whitelist has been set. -Turn OFF a whitelist by an empty echo command: - - echo > /sys/devices/system/edac/pci/pci_parity_whitelist - -and any previous blacklist will be utilized. - diff --git a/Documentation/fb/imacfb.txt b/Documentation/fb/imacfb.txt new file mode 100644 index 000000000000..759028545a7e --- /dev/null +++ b/Documentation/fb/imacfb.txt @@ -0,0 +1,31 @@ + +What is imacfb? +=============== + +This is a generic EFI platform driver for Intel based Apple computers. +Imacfb is only for EFI booted Intel Macs. + +Supported Hardware +================== + +iMac 17"/20" +Macbook +Macbook Pro 15"/17" +MacMini + +How to use it? +============== + +Imacfb does not have any kind of autodetection of your machine. +You have to add the fillowing kernel parameters in your elilo.conf: + Macbook : + video=imacfb:macbook + MacMini : + video=imacfb:mini + Macbook Pro 15", iMac 17" : + video=imacfb:i17 + Macbook Pro 17", iMac 20" : + video=imacfb:i20 + +-- +Edgar Hucek <gimli@dark-green.com> diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 99f219a01e0e..552507fe9a7e 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -55,14 +55,6 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br> --------------------------- -What: remove EXPORT_SYMBOL(insert_resource) -When: April 2006 -Files: kernel/resource.c -Why: No modular usage in the kernel. -Who: Adrian Bunk <bunk@stusta.de> - ---------------------------- - What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) When: November 2005 Files: drivers/pcmcia/: pcmcia_ioctl.c @@ -128,6 +120,13 @@ Who: Adrian Bunk <bunk@stusta.de> --------------------------- +What: drivers depending on OSS_OBSOLETE_DRIVER +When: options in 2.6.20, code in 2.6.22 +Why: OSS drivers with ALSA replacements +Who: Adrian Bunk <bunk@stusta.de> + +--------------------------- + What: pci_module_init(driver) When: January 2007 Why: Is replaced by pci_register_driver(pci_driver). @@ -166,17 +165,6 @@ Who: Arjan van de Ven <arjan@linux.intel.com> --------------------------- -What: remove EXPORT_SYMBOL(tasklist_lock) -When: August 2006 -Files: kernel/fork.c -Why: tasklist_lock protects the kernel internal task list. Modules have - no business looking at it, and all instances in drivers have been due - to use of too-lowlevel APIs. Having this symbol exported prevents - moving to more scalable locking schemes for the task list. -Who: Christoph Hellwig <hch@lst.de> - ---------------------------- - What: mount/umount uevents When: February 2007 Why: These events are not correct, and do not properly let userspace know @@ -266,3 +254,43 @@ Why: The interrupt related SA_* flags are replaced by IRQF_* to move them Who: Thomas Gleixner <tglx@linutronix.de> --------------------------- + +What: i2c-ite and i2c-algo-ite drivers +When: September 2006 +Why: These drivers never compiled since they were added to the kernel + tree 5 years ago. This feature removal can be reevaluated if + someone shows interest in the drivers, fixes them and takes over + maintenance. + http://marc.theaimsgroup.com/?l=linux-mips&m=115040510817448 +Who: Jean Delvare <khali@linux-fr.org> + +--------------------------- + +What: Bridge netfilter deferred IPv4/IPv6 output hook calling +When: January 2007 +Why: The deferred output hooks are a layering violation causing unusual + and broken behaviour on bridge devices. Examples of things they + break include QoS classifation using the MARK or CLASSIFY targets, + the IPsec policy match and connection tracking with VLANs on a + bridge. Their only use is to enable bridge output port filtering + within iptables with the physdev match, which can also be done by + combining iptables and ebtables using netfilter marks. Until it + will get removed the hook deferral is disabled by default and is + only enabled when needed. + +Who: Patrick McHardy <kaber@trash.net> + +--------------------------- + +What: frame diverter +When: November 2006 +Why: The frame diverter is included in most distribution kernels, but is + broken. It does not correctly handle many things: + - IPV6 + - non-linear skb's + - network device RCU on removal + - input frames not correctly checked for protocol errors + It also adds allocation overhead even if not enabled. + It is not clear if anyone is still using it. +Who: Stephen Hemminger <shemminger@osdl.org> + diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 66fdc0744fe0..16dec61d7671 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -62,8 +62,8 @@ ramfs-rootfs-initramfs.txt - info on the 'in memory' filesystems ramfs, rootfs and initramfs. reiser4.txt - info on the Reiser4 filesystem based on dancing tree algorithms. -relayfs.txt - - info on relayfs, for efficient streaming from kernel to user space. +relay.txt + - info on relay, for efficient streaming from kernel to user space. romfs.txt - description of the ROMFS filesystem. smbfs.txt diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index d31efbbdfe50..247d7f619aa2 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -142,8 +142,8 @@ see also dquot_operations section. --------------------------- file_system_type --------------------------- prototypes: - struct int (*get_sb) (struct file_system_type *, int, - const char *, void *, struct vfsmount *); + int (*get_sb) (struct file_system_type *, int, + const char *, void *, struct vfsmount *); void (*kill_sb) (struct super_block *); locking rules: may block BKL diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt new file mode 100644 index 000000000000..d6788dae0349 --- /dev/null +++ b/Documentation/filesystems/relay.txt @@ -0,0 +1,479 @@ +relay interface (formerly relayfs) +================================== + +The relay interface provides a means for kernel applications to +efficiently log and transfer large quantities of data from the kernel +to userspace via user-defined 'relay channels'. + +A 'relay channel' is a kernel->user data relay mechanism implemented +as a set of per-cpu kernel buffers ('channel buffers'), each +represented as a regular file ('relay file') in user space. Kernel +clients write into the channel buffers using efficient write +functions; these automatically log into the current cpu's channel +buffer. User space applications mmap() or read() from the relay files +and retrieve the data as it becomes available. The relay files +themselves are files created in a host filesystem, e.g. debugfs, and +are associated with the channel buffers using the API described below. + +The format of the data logged into the channel buffers is completely +up to the kernel client; the relay interface does however provide +hooks which allow kernel clients to impose some structure on the +buffer data. The relay interface doesn't implement any form of data +filtering - this also is left to the kernel client. The purpose is to +keep things as simple as possible. + +This document provides an overview of the relay interface API. The +details of the function parameters are documented along with the +functions in the relay interface code - please see that for details. + +Semantics +========= + +Each relay channel has one buffer per CPU, each buffer has one or more +sub-buffers. Messages are written to the first sub-buffer until it is +too full to contain a new message, in which case it it is written to +the next (if available). Messages are never split across sub-buffers. +At this point, userspace can be notified so it empties the first +sub-buffer, while the kernel continues writing to the next. + +When notified that a sub-buffer is full, the kernel knows how many +bytes of it are padding i.e. unused space occurring because a complete +message couldn't fit into a sub-buffer. Userspace can use this +knowledge to copy only valid data. + +After copying it, userspace can notify the kernel that a sub-buffer +has been consumed. + +A relay channel can operate in a mode where it will overwrite data not +yet collected by userspace, and not wait for it to be consumed. + +The relay channel itself does not provide for communication of such +data between userspace and kernel, allowing the kernel side to remain +simple and not impose a single interface on userspace. It does +provide a set of examples and a separate helper though, described +below. + +The read() interface both removes padding and internally consumes the +read sub-buffers; thus in cases where read(2) is being used to drain +the channel buffers, special-purpose communication between kernel and +user isn't necessary for basic operation. + +One of the major goals of the relay interface is to provide a low +overhead mechanism for conveying kernel data to userspace. While the +read() interface is easy to use, it's not as efficient as the mmap() +approach; the example code attempts to make the tradeoff between the +two approaches as small as possible. + +klog and relay-apps example code +================================ + +The relay interface itself is ready to use, but to make things easier, +a couple simple utility functions and a set of examples are provided. + +The relay-apps example tarball, available on the relay sourceforge +site, contains a set of self-contained examples, each consisting of a +pair of .c files containing boilerplate code for each of the user and +kernel sides of a relay application. When combined these two sets of +boilerplate code provide glue to easily stream data to disk, without +having to bother with mundane housekeeping chores. + +The 'klog debugging functions' patch (klog.patch in the relay-apps +tarball) provides a couple of high-level logging functions to the +kernel which allow writing formatted text or raw data to a channel, +regardless of whether a channel to write into exists or not, or even +whether the relay interface is compiled into the kernel or not. These +functions allow you to put unconditional 'trace' statements anywhere +in the kernel or kernel modules; only when there is a 'klog handler' +registered will data actually be logged (see the klog and kleak +examples for details). + +It is of course possible to use the relay interface from scratch, +i.e. without using any of the relay-apps example code or klog, but +you'll have to implement communication between userspace and kernel, +allowing both to convey the state of buffers (full, empty, amount of +padding). The read() interface both removes padding and internally +consumes the read sub-buffers; thus in cases where read(2) is being +used to drain the channel buffers, special-purpose communication +between kernel and user isn't necessary for basic operation. Things +such as buffer-full conditions would still need to be communicated via +some channel though. + +klog and the relay-apps examples can be found in the relay-apps +tarball on http://relayfs.sourceforge.net + +The relay interface user space API +================================== + +The relay interface implements basic file operations for user space +access to relay channel buffer data. Here are the file operations +that are available and some comments regarding their behavior: + +open() enables user to open an _existing_ channel buffer. + +mmap() results in channel buffer being mapped into the caller's + memory space. Note that you can't do a partial mmap - you + must map the entire file, which is NRBUF * SUBBUFSIZE. + +read() read the contents of a channel buffer. The bytes read are + 'consumed' by the reader, i.e. they won't be available + again to subsequent reads. If the channel is being used + in no-overwrite mode (the default), it can be read at any + time even if there's an active kernel writer. If the + channel is being used in overwrite mode and there are + active channel writers, results may be unpredictable - + users should make sure that all logging to the channel has + ended before using read() with overwrite mode. Sub-buffer + padding is automatically removed and will not be seen by + the reader. + +sendfile() transfer data from a channel buffer to an output file + descriptor. Sub-buffer padding is automatically removed + and will not be seen by the reader. + +poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are + notified when sub-buffer boundaries are crossed. + +close() decrements the channel buffer's refcount. When the refcount + reaches 0, i.e. when no process or kernel client has the + buffer open, the channel buffer is freed. + +In order for a user application to make use of relay files, the +host filesystem must be mounted. For example, + + mount -t debugfs debugfs /debug + +NOTE: the host filesystem doesn't need to be mounted for kernel + clients to create or use channels - it only needs to be + mounted when user space applications need access to the buffer + data. + + +The relay interface kernel API +============================== + +Here's a summary of the API the relay interface provides to in-kernel clients: + +TBD(curr. line MT:/API/) + channel management functions: + + relay_open(base_filename, parent, subbuf_size, n_subbufs, + callbacks) + relay_close(chan) + relay_flush(chan) + relay_reset(chan) + + channel management typically called on instigation of userspace: + + relay_subbufs_consumed(chan, cpu, subbufs_consumed) + + write functions: + + relay_write(chan, data, length) + __relay_write(chan, data, length) + relay_reserve(chan, length) + + callbacks: + + subbuf_start(buf, subbuf, prev_subbuf, prev_padding) + buf_mapped(buf, filp) + buf_unmapped(buf, filp) + create_buf_file(filename, parent, mode, buf, is_global) + remove_buf_file(dentry) + + helper functions: + + relay_buf_full(buf) + subbuf_start_reserve(buf, length) + + +Creating a channel +------------------ + +relay_open() is used to create a channel, along with its per-cpu +channel buffers. Each channel buffer will have an associated file +created for it in the host filesystem, which can be and mmapped or +read from in user space. The files are named basename0...basenameN-1 +where N is the number of online cpus, and by default will be created +in the root of the filesystem (if the parent param is NULL). If you +want a directory structure to contain your relay files, you should +create it using the host filesystem's directory creation function, +e.g. debugfs_create_dir(), and pass the parent directory to +relay_open(). Users are responsible for cleaning up any directory +structure they create, when the channel is closed - again the host +filesystem's directory removal functions should be used for that, +e.g. debugfs_remove(). + +In order for a channel to be created and the host filesystem's files +associated with its channel buffers, the user must provide definitions +for two callback functions, create_buf_file() and remove_buf_file(). +create_buf_file() is called once for each per-cpu buffer from +relay_open() and allows the user to create the file which will be used +to represent the corresponding channel buffer. The callback should +return the dentry of the file created to represent the channel buffer. +remove_buf_file() must also be defined; it's responsible for deleting +the file(s) created in create_buf_file() and is called during +relay_close(). + +Here are some typical definitions for these callbacks, in this case +using debugfs: + +/* + * create_buf_file() callback. Creates relay file in debugfs. + */ +static struct dentry *create_buf_file_handler(const char *filename, + struct dentry *parent, + int mode, + struct rchan_buf *buf, + int *is_global) +{ + return debugfs_create_file(filename, mode, parent, buf, + &relay_file_operations); +} + +/* + * remove_buf_file() callback. Removes relay file from debugfs. + */ +static int remove_buf_file_handler(struct dentry *dentry) +{ + debugfs_remove(dentry); + + return 0; +} + +/* + * relay interface callbacks + */ +static struct rchan_callbacks relay_callbacks = +{ + .create_buf_file = create_buf_file_handler, + .remove_buf_file = remove_buf_file_handler, +}; + +And an example relay_open() invocation using them: + + chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks); + +If the create_buf_file() callback fails, or isn't defined, channel +creation and thus relay_open() will fail. + +The total size of each per-cpu buffer is calculated by multiplying the +number of sub-buffers by the sub-buffer size passed into relay_open(). +The idea behind sub-buffers is that they're basically an extension of +double-buffering to N buffers, and they also allow applications to +easily implement random-access-on-buffer-boundary schemes, which can +be important for some high-volume applications. The number and size +of sub-buffers is completely dependent on the application and even for +the same application, different conditions will warrant different +values for these parameters at different times. Typically, the right +values to use are best decided after some experimentation; in general, +though, it's safe to assume that having only 1 sub-buffer is a bad +idea - you're guaranteed to either overwrite data or lose events +depending on the channel mode being used. + +The create_buf_file() implementation can also be defined in such a way +as to allow the creation of a single 'global' buffer instead of the +default per-cpu set. This can be useful for applications interested +mainly in seeing the relative ordering of system-wide events without +the need to bother with saving explicit timestamps for the purpose of +merging/sorting per-cpu files in a postprocessing step. + +To have relay_open() create a global buffer, the create_buf_file() +implementation should set the value of the is_global outparam to a +non-zero value in addition to creating the file that will be used to +represent the single buffer. In the case of a global buffer, +create_buf_file() and remove_buf_file() will be called only once. The +normal channel-writing functions, e.g. relay_write(), can still be +used - writes from any cpu will transparently end up in the global +buffer - but since it is a global buffer, callers should make sure +they use the proper locking for such a buffer, either by wrapping +writes in a spinlock, or by copying a write function from relay.h and +creating a local version that internally does the proper locking. + +Channel 'modes' +--------------- + +relay channels can be used in either of two modes - 'overwrite' or +'no-overwrite'. The mode is entirely determined by the implementation +of the subbuf_start() callback, as described below. The default if no +subbuf_start() callback is defined is 'no-overwrite' mode. If the +default mode suits your needs, and you plan to use the read() +interface to retrieve channel data, you can ignore the details of this +section, as it pertains mainly to mmap() implementations. + +In 'overwrite' mode, also known as 'flight recorder' mode, writes +continuously cycle around the buffer and will never fail, but will +unconditionally overwrite old data regardless of whether it's actually +been consumed. In no-overwrite mode, writes will fail, i.e. data will +be lost, if the number of unconsumed sub-buffers equals the total +number of sub-buffers in the channel. It should be clear that if +there is no consumer or if the consumer can't consume sub-buffers fast +enough, data will be lost in either case; the only difference is +whether data is lost from the beginning or the end of a buffer. + +As explained above, a relay channel is made of up one or more +per-cpu channel buffers, each implemented as a circular buffer +subdivided into one or more sub-buffers. Messages are written into +the current sub-buffer of the channel's current per-cpu buffer via the +write functions described below. Whenever a message can't fit into +the current sub-buffer, because there's no room left for it, the +client is notified via the subbuf_start() callback that a switch to a +new sub-buffer is about to occur. The client uses this callback to 1) +initialize the next sub-buffer if appropriate 2) finalize the previous +sub-buffer if appropriate and 3) return a boolean value indicating +whether or not to actually move on to the next sub-buffer. + +To implement 'no-overwrite' mode, the userspace client would provide +an implementation of the subbuf_start() callback something like the +following: + +static int subbuf_start(struct rchan_buf *buf, + void *subbuf, + void *prev_subbuf, + unsigned int prev_padding) +{ + if (prev_subbuf) + *((unsigned *)prev_subbuf) = prev_padding; + + if (relay_buf_full(buf)) + return 0; + + subbuf_start_reserve(buf, sizeof(unsigned int)); + + return 1; +} + +If the current buffer is full, i.e. all sub-buffers remain unconsumed, +the callback returns 0 to indicate that the buffer switch should not +occur yet, i.e. until the consumer has had a chance to read the +current set of ready sub-buffers. For the relay_buf_full() function +to make sense, the consumer is reponsible for notifying the relay +interface when sub-buffers have been consumed via +relay_subbufs_consumed(). Any subsequent attempts to write into the +buffer will again invoke the subbuf_start() callback with the same +parameters; only when the consumer has consumed one or more of the +ready sub-buffers will relay_buf_full() return 0, in which case the +buffer switch can continue. + +The implementation of the subbuf_start() callback for 'overwrite' mode +would be very similar: + +static int subbuf_start(struct rchan_buf *buf, + void *subbuf, + void *prev_subbuf, + unsigned int prev_padding) +{ + if (prev_subbuf) + *((unsigned *)prev_subbuf) = prev_padding; + + subbuf_start_reserve(buf, sizeof(unsigned int)); + + return 1; +} + +In this case, the relay_buf_full() check is meaningless and the +callback always returns 1, causing the buffer switch to occur +unconditionally. It's also meaningless for the client to use the +relay_subbufs_consumed() function in this mode, as it's never +consulted. + +The default subbuf_start() implementation, used if the client doesn't +define any callbacks, or doesn't define the subbuf_start() callback, +implements the simplest possible 'no-overwrite' mode, i.e. it does +nothing but return 0. + +Header information can be reserved at the beginning of each sub-buffer +by calling the subbuf_start_reserve() helper function from within the +subbuf_start() callback. This reserved area can be used to store +whatever information the client wants. In the example above, room is +reserved in each sub-buffer to store the padding count for that +sub-buffer. This is filled in for the previous sub-buffer in the +subbuf_start() implementation; the padding value for the previous +sub-buffer is passed into the subbuf_start() callback along with a +pointer to the previous sub-buffer, since the padding value isn't +known until a sub-buffer is filled. The subbuf_start() callback is +also called for the first sub-buffer when the channel is opened, to +give the client a chance to reserve space in it. In this case the +previous sub-buffer pointer passed into the callback will be NULL, so +the client should check the value of the prev_subbuf pointer before +writing into the previous sub-buffer. + +Writing to a channel +-------------------- + +Kernel clients write data into the current cpu's channel buffer using +relay_write() or __relay_write(). relay_write() is the main logging +function - it uses local_irqsave() to protect the buffer and should be +used if you might be logging from interrupt context. If you know +you'll never be logging from interrupt context, you can use +__relay_write(), which only disables preemption. These functions +don't return a value, so you can't determine whether or not they +failed - the assumption is that you wouldn't want to check a return +value in the fast logging path anyway, and that they'll always succeed +unless the buffer is full and no-overwrite mode is being used, in +which case you can detect a failed write in the subbuf_start() +callback by calling the relay_buf_full() helper function. + +relay_reserve() is used to reserve a slot in a channel buffer which +can be written to later. This would typically be used in applications +that need to write directly into a channel buffer without having to +stage data in a temporary buffer beforehand. Because the actual write +may not happen immediately after the slot is reserved, applications +using relay_reserve() can keep a count of the number of bytes actually +written, either in space reserved in the sub-buffers themselves or as +a separate array. See the 'reserve' example in the relay-apps tarball +at http://relayfs.sourceforge.net for an example of how this can be +done. Because the write is under control of the client and is +separated from the reserve, relay_reserve() doesn't protect the buffer +at all - it's up to the client to provide the appropriate +synchronization when using relay_reserve(). + +Closing a channel +----------------- + +The client calls relay_close() when it's finished using the channel. +The channel and its associated buffers are destroyed when there are no +longer any references to any of the channel buffers. relay_flush() +forces a sub-buffer switch on all the channel buffers, and can be used +to finalize and process the last sub-buffers before the channel is +closed. + +Misc +---- + +Some applications may want to keep a channel around and re-use it +rather than open and close a new channel for each use. relay_reset() +can be used for this purpose - it resets a channel to its initial +state without reallocating channel buffer memory or destroying +existing mappings. It should however only be called when it's safe to +do so, i.e. when the channel isn't currently being written to. + +Finally, there are a couple of utility callbacks that can be used for +different purposes. buf_mapped() is called whenever a channel buffer +is mmapped from user space and buf_unmapped() is called when it's +unmapped. The client can use this notification to trigger actions +within the kernel application, such as enabling/disabling logging to +the channel. + + +Resources +========= + +For news, example code, mailing list, etc. see the relay interface homepage: + + http://relayfs.sourceforge.net + + +Credits +======= + +The ideas and specs for the relay interface came about as a result of +discussions on tracing involving the following: + +Michel Dagenais <michel.dagenais@polymtl.ca> +Richard Moore <richardj_moore@uk.ibm.com> +Bob Wisniewski <bob@watson.ibm.com> +Karim Yaghmour <karim@opersys.com> +Tom Zanussi <zanussi@us.ibm.com> + +Also thanks to Hubertus Franke for a lot of useful suggestions and bug +reports. diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt deleted file mode 100644 index 5832377b7340..000000000000 --- a/Documentation/filesystems/relayfs.txt +++ /dev/null @@ -1,442 +0,0 @@ - -relayfs - a high-speed data relay filesystem -============================================ - -relayfs is a filesystem designed to provide an efficient mechanism for -tools and facilities to relay large and potentially sustained streams -of data from kernel space to user space. - -The main abstraction of relayfs is the 'channel'. A channel consists -of a set of per-cpu kernel buffers each represented by a file in the -relayfs filesystem. Kernel clients write into a channel using -efficient write functions which automatically log to the current cpu's -channel buffer. User space applications mmap() the per-cpu files and -retrieve the data as it becomes available. - -The format of the data logged into the channel buffers is completely -up to the relayfs client; relayfs does however provide hooks which -allow clients to impose some structure on the buffer data. Nor does -relayfs implement any form of data filtering - this also is left to -the client. The purpose is to keep relayfs as simple as possible. - -This document provides an overview of the relayfs API. The details of -the function parameters are documented along with the functions in the -filesystem code - please see that for details. - -Semantics -========= - -Each relayfs channel has one buffer per CPU, each buffer has one or -more sub-buffers. Messages are written to the first sub-buffer until -it is too full to contain a new message, in which case it it is -written to the next (if available). Messages are never split across -sub-buffers. At this point, userspace can be notified so it empties -the first sub-buffer, while the kernel continues writing to the next. - -When notified that a sub-buffer is full, the kernel knows how many -bytes of it are padding i.e. unused. Userspace can use this knowledge -to copy only valid data. - -After copying it, userspace can notify the kernel that a sub-buffer -has been consumed. - -relayfs can operate in a mode where it will overwrite data not yet -collected by userspace, and not wait for it to consume it. - -relayfs itself does not provide for communication of such data between -userspace and kernel, allowing the kernel side to remain simple and -not impose a single interface on userspace. It does provide a set of -examples and a separate helper though, described below. - -klog and relay-apps example code -================================ - -relayfs itself is ready to use, but to make things easier, a couple -simple utility functions and a set of examples are provided. - -The relay-apps example tarball, available on the relayfs sourceforge -site, contains a set of self-contained examples, each consisting of a -pair of .c files containing boilerplate code for each of the user and -kernel sides of a relayfs application; combined these two sets of -boilerplate code provide glue to easily stream data to disk, without -having to bother with mundane housekeeping chores. - -The 'klog debugging functions' patch (klog.patch in the relay-apps -tarball) provides a couple of high-level logging functions to the -kernel which allow writing formatted text or raw data to a channel, -regardless of whether a channel to write into exists or not, or -whether relayfs is compiled into the kernel or is configured as a -module. These functions allow you to put unconditional 'trace' -statements anywhere in the kernel or kernel modules; only when there -is a 'klog handler' registered will data actually be logged (see the -klog and kleak examples for details). - -It is of course possible to use relayfs from scratch i.e. without -using any of the relay-apps example code or klog, but you'll have to -implement communication between userspace and kernel, allowing both to -convey the state of buffers (full, empty, amount of padding). - -klog and the relay-apps examples can be found in the relay-apps -tarball on http://relayfs.sourceforge.net - - -The relayfs user space API -========================== - -relayfs implements basic file operations for user space access to -relayfs channel buffer data. Here are the file operations that are -available and some comments regarding their behavior: - -open() enables user to open an _existing_ buffer. - -mmap() results in channel buffer being mapped into the caller's - memory space. Note that you can't do a partial mmap - you must - map the entire file, which is NRBUF * SUBBUFSIZE. - -read() read the contents of a channel buffer. The bytes read are - 'consumed' by the reader i.e. they won't be available again - to subsequent reads. If the channel is being used in - no-overwrite mode (the default), it can be read at any time - even if there's an active kernel writer. If the channel is - being used in overwrite mode and there are active channel - writers, results may be unpredictable - users should make - sure that all logging to the channel has ended before using - read() with overwrite mode. - -poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are - notified when sub-buffer boundaries are crossed. - -close() decrements the channel buffer's refcount. When the refcount - reaches 0 i.e. when no process or kernel client has the buffer - open, the channel buffer is freed. - - -In order for a user application to make use of relayfs files, the -relayfs filesystem must be mounted. For example, - - mount -t relayfs relayfs /mnt/relay - -NOTE: relayfs doesn't need to be mounted for kernel clients to create - or use channels - it only needs to be mounted when user space - applications need access to the buffer data. - - -The relayfs kernel API -====================== - -Here's a summary of the API relayfs provides to in-kernel clients: - - - channel management functions: - - relay_open(base_filename, parent, subbuf_size, n_subbufs, - callbacks) - relay_close(chan) - relay_flush(chan) - relay_reset(chan) - relayfs_create_dir(name, parent) - relayfs_remove_dir(dentry) - relayfs_create_file(name, parent, mode, fops, data) - relayfs_remove_file(dentry) - - channel management typically called on instigation of userspace: - - relay_subbufs_consumed(chan, cpu, subbufs_consumed) - - write functions: - - relay_write(chan, data, length) - __relay_write(chan, data, length) - relay_reserve(chan, length) - - callbacks: - - subbuf_start(buf, subbuf, prev_subbuf, prev_padding) - buf_mapped(buf, filp) - buf_unmapped(buf, filp) - create_buf_file(filename, parent, mode, buf, is_global) - remove_buf_file(dentry) - - helper functions: - - relay_buf_full(buf) - subbuf_start_reserve(buf, length) - - -Creating a channel ------------------- - -relay_open() is used to create a channel, along with its per-cpu -channel buffers. Each channel buffer will have an associated file -created for it in the relayfs filesystem, which can be opened and -mmapped from user space if desired. The files are named -basename0...basenameN-1 where N is the number of online cpus, and by -default will be created in the root of the filesystem. If you want a -directory structure to contain your relayfs files, you can create it -with relayfs_create_dir() and pass the parent directory to -relay_open(). Clients are responsible for cleaning up any directory -structure they create when the channel is closed - use -relayfs_remove_dir() for that. - -The total size of each per-cpu buffer is calculated by multiplying the -number of sub-buffers by the sub-buffer size passed into relay_open(). -The idea behind sub-buffers is that they're basically an extension of -double-buffering to N buffers, and they also allow applications to -easily implement random-access-on-buffer-boundary schemes, which can -be important for some high-volume applications. The number and size -of sub-buffers is completely dependent on the application and even for -the same application, different conditions will warrant different -values for these parameters at different times. Typically, the right -values to use are best decided after some experimentation; in general, -though, it's safe to assume that having only 1 sub-buffer is a bad -idea - you're guaranteed to either overwrite data or lose events -depending on the channel mode being used. - -Channel 'modes' ---------------- - -relayfs channels can be used in either of two modes - 'overwrite' or -'no-overwrite'. The mode is entirely determined by the implementation -of the subbuf_start() callback, as described below. In 'overwrite' -mode, also known as 'flight recorder' mode, writes continuously cycle -around the buffer and will never fail, but will unconditionally -overwrite old data regardless of whether it's actually been consumed. -In no-overwrite mode, writes will fail i.e. data will be lost, if the -number of unconsumed sub-buffers equals the total number of -sub-buffers in the channel. It should be clear that if there is no -consumer or if the consumer can't consume sub-buffers fast enought, -data will be lost in either case; the only difference is whether data -is lost from the beginning or the end of a buffer. - -As explained above, a relayfs channel is made of up one or more -per-cpu channel buffers, each implemented as a circular buffer -subdivided into one or more sub-buffers. Messages are written into -the current sub-buffer of the channel's current per-cpu buffer via the -write functions described below. Whenever a message can't fit into -the current sub-buffer, because there's no room left for it, the -client is notified via the subbuf_start() callback that a switch to a -new sub-buffer is about to occur. The client uses this callback to 1) -initialize the next sub-buffer if appropriate 2) finalize the previous -sub-buffer if appropriate and 3) return a boolean value indicating -whether or not to actually go ahead with the sub-buffer switch. - -To implement 'no-overwrite' mode, the userspace client would provide -an implementation of the subbuf_start() callback something like the -following: - -static int subbuf_start(struct rchan_buf *buf, - void *subbuf, - void *prev_subbuf, - unsigned int prev_padding) -{ - if (prev_subbuf) - *((unsigned *)prev_subbuf) = prev_padding; - - if (relay_buf_full(buf)) - return 0; - - subbuf_start_reserve(buf, sizeof(unsigned int)); - - return 1; -} - -If the current buffer is full i.e. all sub-buffers remain unconsumed, -the callback returns 0 to indicate that the buffer switch should not -occur yet i.e. until the consumer has had a chance to read the current -set of ready sub-buffers. For the relay_buf_full() function to make -sense, the consumer is reponsible for notifying relayfs when -sub-buffers have been consumed via relay_subbufs_consumed(). Any -subsequent attempts to write into the buffer will again invoke the -subbuf_start() callback with the same parameters; only when the -consumer has consumed one or more of the ready sub-buffers will -relay_buf_full() return 0, in which case the buffer switch can -continue. - -The implementation of the subbuf_start() callback for 'overwrite' mode -would be very similar: - -static int subbuf_start(struct rchan_buf *buf, - void *subbuf, - void *prev_subbuf, - unsigned int prev_padding) -{ - if (prev_subbuf) - *((unsigned *)prev_subbuf) = prev_padding; - - subbuf_start_reserve(buf, sizeof(unsigned int)); - - return 1; -} - -In this case, the relay_buf_full() check is meaningless and the -callback always returns 1, causing the buffer switch to occur -unconditionally. It's also meaningless for the client to use the -relay_subbufs_consumed() function in this mode, as it's never -consulted. - -The default subbuf_start() implementation, used if the client doesn't -define any callbacks, or doesn't define the subbuf_start() callback, -implements the simplest possible 'no-overwrite' mode i.e. it does -nothing but return 0. - -Header information can be reserved at the beginning of each sub-buffer -by calling the subbuf_start_reserve() helper function from within the -subbuf_start() callback. This reserved area can be used to store -whatever information the client wants. In the example above, room is -reserved in each sub-buffer to store the padding count for that -sub-buffer. This is filled in for the previous sub-buffer in the -subbuf_start() implementation; the padding value for the previous -sub-buffer is passed into the subbuf_start() callback along with a -pointer to the previous sub-buffer, since the padding value isn't -known until a sub-buffer is filled. The subbuf_start() callback is -also called for the first sub-buffer when the channel is opened, to -give the client a chance to reserve space in it. In this case the -previous sub-buffer pointer passed into the callback will be NULL, so -the client should check the value of the prev_subbuf pointer before -writing into the previous sub-buffer. - -Writing to a channel --------------------- - -kernel clients write data into the current cpu's channel buffer using -relay_write() or __relay_write(). relay_write() is the main logging -function - it uses local_irqsave() to protect the buffer and should be -used if you might be logging from interrupt context. If you know -you'll never be logging from interrupt context, you can use -__relay_write(), which only disables preemption. These functions -don't return a value, so you can't determine whether or not they -failed - the assumption is that you wouldn't want to check a return -value in the fast logging path anyway, and that they'll always succeed -unless the buffer is full and no-overwrite mode is being used, in -which case you can detect a failed write in the subbuf_start() -callback by calling the relay_buf_full() helper function. - -relay_reserve() is used to reserve a slot in a channel buffer which -can be written to later. This would typically be used in applications -that need to write directly into a channel buffer without having to -stage data in a temporary buffer beforehand. Because the actual write -may not happen immediately after the slot is reserved, applications -using relay_reserve() can keep a count of the number of bytes actually -written, either in space reserved in the sub-buffers themselves or as -a separate array. See the 'reserve' example in the relay-apps tarball -at http://relayfs.sourceforge.net for an example of how this can be -done. Because the write is under control of the client and is -separated from the reserve, relay_reserve() doesn't protect the buffer -at all - it's up to the client to provide the appropriate -synchronization when using relay_reserve(). - -Closing a channel ------------------ - -The client calls relay_close() when it's finished using the channel. -The channel and its associated buffers are destroyed when there are no -longer any references to any of the channel buffers. relay_flush() -forces a sub-buffer switch on all the channel buffers, and can be used -to finalize and process the last sub-buffers before the channel is -closed. - -Creating non-relay files ------------------------- - -relay_open() automatically creates files in the relayfs filesystem to -represent the per-cpu kernel buffers; it's often useful for -applications to be able to create their own files alongside the relay -files in the relayfs filesystem as well e.g. 'control' files much like -those created in /proc or debugfs for similar purposes, used to -communicate control information between the kernel and user sides of a -relayfs application. For this purpose the relayfs_create_file() and -relayfs_remove_file() API functions exist. For relayfs_create_file(), -the caller passes in a set of user-defined file operations to be used -for the file and an optional void * to a user-specified data item, -which will be accessible via inode->u.generic_ip (see the relay-apps -tarball for examples). The file_operations are a required parameter -to relayfs_create_file() and thus the semantics of these files are -completely defined by the caller. - -See the relay-apps tarball at http://relayfs.sourceforge.net for -examples of how these non-relay files are meant to be used. - -Creating relay files in other filesystems ------------------------------------------ - -By default of course, relay_open() creates relay files in the relayfs -filesystem. Because relay_file_operations is exported, however, it's -also possible to create and use relay files in other pseudo-filesytems -such as debugfs. - -For this purpose, two callback functions are provided, -create_buf_file() and remove_buf_file(). create_buf_file() is called -once for each per-cpu buffer from relay_open() to allow the client to -create a file to be used to represent the corresponding buffer; if -this callback is not defined, the default implementation will create -and return a file in the relayfs filesystem to represent the buffer. -The callback should return the dentry of the file created to represent -the relay buffer. Note that the parent directory passed to -relay_open() (and passed along to the callback), if specified, must -exist in the same filesystem the new relay file is created in. If -create_buf_file() is defined, remove_buf_file() must also be defined; -it's responsible for deleting the file(s) created in create_buf_file() -and is called during relay_close(). - -The create_buf_file() implementation can also be defined in such a way -as to allow the creation of a single 'global' buffer instead of the -default per-cpu set. This can be useful for applications interested -mainly in seeing the relative ordering of system-wide events without -the need to bother with saving explicit timestamps for the purpose of -merging/sorting per-cpu files in a postprocessing step. - -To have relay_open() create a global buffer, the create_buf_file() -implementation should set the value of the is_global outparam to a -non-zero value in addition to creating the file that will be used to -represent the single buffer. In the case of a global buffer, -create_buf_file() and remove_buf_file() will be called only once. The -normal channel-writing functions e.g. relay_write() can still be used -- writes from any cpu will transparently end up in the global buffer - -but since it is a global buffer, callers should make sure they use the -proper locking for such a buffer, either by wrapping writes in a -spinlock, or by copying a write function from relayfs_fs.h and -creating a local version that internally does the proper locking. - -See the 'exported-relayfile' examples in the relay-apps tarball for -examples of creating and using relay files in debugfs. - -Misc ----- - -Some applications may want to keep a channel around and re-use it -rather than open and close a new channel for each use. relay_reset() -can be used for this purpose - it resets a channel to its initial -state without reallocating channel buffer memory or destroying -existing mappings. It should however only be called when it's safe to -do so i.e. when the channel isn't currently being written to. - -Finally, there are a couple of utility callbacks that can be used for -different purposes. buf_mapped() is called whenever a channel buffer -is mmapped from user space and buf_unmapped() is called when it's -unmapped. The client can use this notification to trigger actions -within the kernel application, such as enabling/disabling logging to -the channel. - - -Resources -========= - -For news, example code, mailing list, etc. see the relayfs homepage: - - http://relayfs.sourceforge.net - - -Credits -======= - -The ideas and specs for relayfs came about as a result of discussions -on tracing involving the following: - -Michel Dagenais <michel.dagenais@polymtl.ca> -Richard Moore <richardj_moore@uk.ibm.com> -Bob Wisniewski <bob@watson.ibm.com> -Karim Yaghmour <karim@opersys.com> -Tom Zanussi <zanussi@us.ibm.com> - -Also thanks to Hubertus Franke for a lot of useful suggestions and bug -reports. diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 9d3aed628bc1..1cb7e8be927a 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -113,8 +113,8 @@ members are defined: struct file_system_type { const char *name; int fs_flags; - struct int (*get_sb) (struct file_system_type *, int, - const char *, void *, struct vfsmount *); + int (*get_sb) (struct file_system_type *, int, + const char *, void *, struct vfsmount *); void (*kill_sb) (struct super_block *); struct module *owner; struct file_system_type * next; diff --git a/Documentation/hwmon/abituguru b/Documentation/hwmon/abituguru index 69cdb527d58f..b2c0d61b39a2 100644 --- a/Documentation/hwmon/abituguru +++ b/Documentation/hwmon/abituguru @@ -2,13 +2,36 @@ Kernel driver abituguru ======================= Supported chips: - * Abit uGuru (Hardware Monitor part only) + * Abit uGuru revision 1-3 (Hardware Monitor part only) Prefix: 'abituguru' Addresses scanned: ISA 0x0E0 Datasheet: Not available, this driver is based on reverse engineering. A "Datasheet" has been written based on the reverse engineering it should be available in the same dir as this file under the name abituguru-datasheet. + Note: + The uGuru is a microcontroller with onboard firmware which programs + it to behave as a hwmon IC. There are many different revisions of the + firmware and thus effectivly many different revisions of the uGuru. + Below is an incomplete list with which revisions are used for which + Motherboards: + uGuru 1.00 ~ 1.24 (AI7, KV8-MAX3, AN7) (1) + uGuru 2.0.0.0 ~ 2.0.4.2 (KV8-PRO) + uGuru 2.1.0.0 ~ 2.1.2.8 (AS8, AV8, AA8, AG8, AA8XE, AX8) + uGuru 2.2.0.0 ~ 2.2.0.6 (AA8 Fatal1ty) + uGuru 2.3.0.0 ~ 2.3.0.9 (AN8) + uGuru 3.0.0.0 ~ 3.0.1.2 (AW8, AL8, NI8) + uGuru 4.xxxxx? (AT8 32X) (2) + 1) For revisions 2 and 3 uGuru's the driver can autodetect the + sensortype (Volt or Temp) for bank1 sensors, for revision 1 uGuru's + this doesnot always work. For these uGuru's the autodection can + be overriden with the bank1_types module param. For all 3 known + revison 1 motherboards the correct use of this param is: + bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1 + You may also need to specify the fan_sensors option for these boards + fan_sensors=5 + 2) The current version of the abituguru driver is known to NOT work + on these Motherboards Authors: Hans de Goede <j.w.r.degoede@hhs.nl>, @@ -22,6 +45,11 @@ Module Parameters * force: bool Force detection. Note this parameter only causes the detection to be skipped, if the uGuru can't be read the module initialization (insmod) will still fail. +* bank1_types: int[] Bank1 sensortype autodetection override: + -1 autodetect (default) + 0 volt sensor + 1 temp sensor + 2 not connected * fan_sensors: int Tell the driver how many fan speed sensors there are on your motherboard. Default: 0 (autodetect). * pwms: int Tell the driver how many fan speed controls (fan @@ -29,7 +57,7 @@ Module Parameters * verbose: int How verbose should the driver be? (0-3): 0 normal output 1 + verbose error reporting - 2 + sensors type probing info\n" + 2 + sensors type probing info (default) 3 + retryable error reporting Default: 2 (the driver is still in the testing phase) diff --git a/Documentation/i2c/busses/i2c-sis96x b/Documentation/i2c/busses/i2c-sis96x index 00a009b977e9..08d7b2dac69a 100644 --- a/Documentation/i2c/busses/i2c-sis96x +++ b/Documentation/i2c/busses/i2c-sis96x @@ -42,8 +42,8 @@ I suspect that this driver could be made to work for the following SiS chipsets as well: 635, and 635T. If anyone owns a board with those chips AND is willing to risk crashing & burning an otherwise well-behaved kernel in the name of progress... please contact me at <mhoffman@lightlink.com> or -via the project's mailing list: <lm-sensors@lm-sensors.org>. Please -send bug reports and/or success stories as well. +via the project's mailing list: <i2c@lm-sensors.org>. Please send bug +reports and/or success stories as well. TO DOs diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index 10312bebe55d..c51314b1a463 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt @@ -181,6 +181,7 @@ filled out, however: 5 ELILO 7 GRuB 8 U-BOOT + 9 Xen Please contact <hpa@zytor.com> if you need a bootloader ID value assigned. diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt index df28c7416781..c04a421f4a7c 100644 --- a/Documentation/i386/zero-page.txt +++ b/Documentation/i386/zero-page.txt @@ -63,6 +63,10 @@ Offset Type Description 2 for bootsect-loader 3 for SYSLINUX 4 for ETHERBOOT + 5 for ELILO + 7 for GRuB + 8 for U-BOOT + 9 for Xen V = version 0x211 char loadflags: bit0 = 1: kernel is loaded high (bzImage) diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt index 187035560d7f..864ff3283780 100644 --- a/Documentation/infiniband/ipoib.txt +++ b/Documentation/infiniband/ipoib.txt @@ -51,8 +51,6 @@ Debugging Information References - IETF IP over InfiniBand (ipoib) Working Group - http://ietf.org/html.charters/ipoib-charter.html Transmission of IP over InfiniBand (IPoIB) (RFC 4391) http://ietf.org/rfc/rfc4391.txt IP over InfiniBand (IPoIB) Architecture (RFC 4392) diff --git a/Documentation/initrd.txt b/Documentation/initrd.txt index b1b6440237a6..15f1b35deb34 100644 --- a/Documentation/initrd.txt +++ b/Documentation/initrd.txt @@ -72,6 +72,22 @@ initrd adds the following new options: initrd is mounted as root, and the normal boot procedure is followed, with the RAM disk still mounted as root. +Compressed cpio images +---------------------- + +Recent kernels have support for populating a ramdisk from a compressed cpio +archive, on such systems, the creation of a ramdisk image doesn't need to +involve special block devices or loopbacks, you merely create a directory on +disk with the desired initrd content, cd to that directory, and run (as an +example): + +find . | cpio --quiet -c -o | gzip -9 -n > /boot/imagefile.img + +Examining the contents of an existing image file is just as simple: + +mkdir /tmp/imagefile +cd /tmp/imagefile +gzip -cd /boot/imagefile.img | cpio -imd --quiet Installation ------------ diff --git a/Documentation/input/joystick.txt b/Documentation/input/joystick.txt index d53b857a3710..841c353297e6 100644 --- a/Documentation/input/joystick.txt +++ b/Documentation/input/joystick.txt @@ -39,7 +39,6 @@ them. Bug reports and success stories are also welcome. The input project website is at: - http://www.suse.cz/development/input/ http://atrey.karlin.mff.cuni.cz/~vojtech/input/ There is also a mailing list for the driver at: diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 14ef3868a328..0706699c9da9 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -407,6 +407,20 @@ more details, with real examples. The second argument is optional, and if supplied will be used if first argument is not supported. + ld-option + ld-option is used to check if $(CC) when used to link object files + supports the given option. An optional second option may be + specified if first option are not supported. + + Example: + #arch/i386/kernel/Makefile + vsyscall-flags += $(call ld-option, -Wl$(comma)--hash-style=sysv) + + In the above example vsyscall-flags will be assigned the option + -Wl$(comma)--hash-style=sysv if it is supported by $(CC). + The second argument is optional, and if supplied will be used + if first argument is not supported. + cc-option cc-option is used to check if $(CC) support a given option, and not supported to use an optional second option. diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 149f62ba14a5..87a17337c7f6 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -697,6 +697,12 @@ running once the system is up. ips= [HW,SCSI] Adaptec / IBM ServeRAID controller See header of drivers/scsi/ips.c. + ports= [IP_VS_FTP] IPVS ftp helper module + Default is 21. + Up to 8 (IP_VS_APP_MAX_PORTS) ports + may be specified. + Format: <port>,<port>.... + irqfixup [HW] When an interrupt is not handled search all handlers for it. Intended to get systems with badly broken @@ -1029,6 +1035,8 @@ running once the system is up. nocache [ARM] + nodelayacct [KNL] Disable per-task delay accounting + nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. noexec [IA-64] @@ -1181,6 +1189,8 @@ running once the system is up. Mechanism 2. nommconf [IA-32,X86_64] Disable use of MMCONFIG for PCI Configuration + mmconf [IA-32,X86_64] Force MMCONFIG. This is useful + to override the builtin blacklist. nomsi [MSI] If the PCI_MSI kernel config parameter is enabled, this kernel boot option can be used to disable the use of MSI interrupts system-wide. diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt index 8d9bffbd192c..949f7b5a2053 100644 --- a/Documentation/kobject.txt +++ b/Documentation/kobject.txt @@ -247,7 +247,7 @@ the object-specific fields, which include: - default_attrs: Default attributes to be exported via sysfs when the object is registered.Note that the last attribute has to be initialized to NULL ! You can find a complete implementation - in drivers/block/genhd.c + in block/genhd.c Instances of struct kobj_type are not registered; only referenced by diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 28d1bc3edb1c..46b9b389df35 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1015,10 +1015,9 @@ CPU from reordering them. There are some more advanced barrier functions: (*) set_mb(var, value) - (*) set_wmb(var, value) - These assign the value to the variable and then insert at least a write - barrier after it, depending on the function. They aren't guaranteed to + This assigns the value to the variable and then inserts at least a write + barrier after it, depending on the function. It isn't guaranteed to insert anything more than a compiler barrier in a UP compilation. diff --git a/Documentation/mips/time.README b/Documentation/mips/time.README index 70bc0dd43d6d..69ddc5c14b79 100644 --- a/Documentation/mips/time.README +++ b/Documentation/mips/time.README @@ -65,7 +65,7 @@ the following functions or values: 1. (optional) set up RTC routines 2. (optional) calibrate and set the mips_counter_frequency - b) board_timer_setup - a function pointer. Invoked at the end of time_init() + b) plat_timer_setup - a function pointer. Invoked at the end of time_init() 1. (optional) over-ride any decisions made in time_init() 2. set up the irqaction for timer interrupt. 3. enable the timer interrupt @@ -116,19 +116,17 @@ Step 2: the machine setup() function If you supply board_time_init(), set the function poointer. - Set the function pointer board_timer_setup() (mandatory) - -Step 3: implement rtc routines, board_time_init() and board_timer_setup() +Step 3: implement rtc routines, board_time_init() and plat_timer_setup() if needed. - board_time_init() - + board_time_init() - a) (optional) set up RTC routines, b) (optional) calibrate and set the mips_counter_frequency (only needed if you intended to use fixed_rate_gettimeoffset or use cpu counter as timer interrupt source) - board_timer_setup() - + plat_timer_setup() - a) (optional) over-write any choices made above by time_init(). b) machine specific code should setup the timer irqaction. c) enable the timer interrupt diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d46338af6002..90ed78110fd4 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -102,9 +102,15 @@ inet_peer_gc_maxtime - INTEGER TCP variables: tcp_abc - INTEGER - Controls Appropriate Byte Count defined in RFC3465. If set to - 0 then does congestion avoid once per ack. 1 is conservative - value, and 2 is more agressive. + Controls Appropriate Byte Count (ABC) defined in RFC3465. + ABC is a way of increasing congestion window (cwnd) more slowly + in response to partial acknowledgments. + Possible values are: + 0 increase cwnd once per acknowledgment (no ABC) + 1 increase cwnd once per acknowledgment of full sized segment + 2 allow increase cwnd by two if acknowledgment is + of two segments to compensate for delayed acknowledgments. + Default: 0 (off) tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt @@ -294,15 +300,15 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max Default: 87380*2 bytes. tcp_mem - vector of 3 INTEGERs: min, pressure, max - low: below this number of pages TCP is not bothered about its + min: below this number of pages TCP is not bothered about its memory appetite. pressure: when amount of memory allocated by TCP exceeds this number of pages, TCP moderates its memory consumption and enters memory pressure mode, which is exited when memory consumption falls - under "low". + under "min". - high: number of pages allowed for queueing by all TCP sockets. + max: number of pages allowed for queueing by all TCP sockets. Defaults are calculated at boot time from amount of available memory. diff --git a/Documentation/nfsroot.txt b/Documentation/nfsroot.txt index d56dc71d9430..3cc953cb288f 100644 --- a/Documentation/nfsroot.txt +++ b/Documentation/nfsroot.txt @@ -4,15 +4,16 @@ Mounting the root filesystem via NFS (nfsroot) Written 1996 by Gero Kuhlmann <gero@gkminix.han.de> Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz> Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org> +Updated 2006 by Horms <horms@verge.net.au> -If you want to use a diskless system, as an X-terminal or printer -server for example, you have to put your root filesystem onto a -non-disk device. This can either be a ramdisk (see initrd.txt in -this directory for further information) or a filesystem mounted -via NFS. The following text describes on how to use NFS for the -root filesystem. For the rest of this text 'client' means the +In order to use a diskless system, such as an X-terminal or printer server +for example, it is necessary for the root filesystem to be present on a +non-disk device. This may be an initramfs (see Documentation/filesystems/ +ramfs-rootfs-initramfs.txt), a ramdisk (see Documenation/initrd.txt) or a +filesystem mounted via NFS. The following text describes on how to use NFS +for the root filesystem. For the rest of this text 'client' means the diskless system, and 'server' means the NFS server. @@ -21,11 +22,13 @@ diskless system, and 'server' means the NFS server. 1.) Enabling nfsroot capabilities ----------------------------- -In order to use nfsroot you have to select support for NFS during -kernel configuration. Note that NFS cannot be loaded as a module -in this case. The configuration script will then ask you whether -you want to use nfsroot, and if yes what kind of auto configuration -system you want to use. Selecting both BOOTP and RARP is safe. +In order to use nfsroot, NFS client support needs to be selected as +built-in during configuration. Once this has been selected, the nfsroot +option will become available, which should also be selected. + +In the networking options, kernel level autoconfiguration can be selected, +along with the types of autoconfiguration to support. Selecting all of +DHCP, BOOTP and RARP is safe. @@ -33,11 +36,10 @@ system you want to use. Selecting both BOOTP and RARP is safe. 2.) Kernel command line ------------------- -When the kernel has been loaded by a boot loader (either by loadlin, -LILO or a network boot program) it has to be told what root fs device -to use, and where to find the server and the name of the directory -on the server to mount as root. This can be established by a couple -of kernel command line parameters: +When the kernel has been loaded by a boot loader (see below) it needs to be +told what root fs device to use. And in the case of nfsroot, where to find +both the server and the name of the directory on the server to mount as root. +This can be established using the following kernel command line parameters: root=/dev/nfs @@ -49,23 +51,21 @@ root=/dev/nfs nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] - If the `nfsroot' parameter is NOT given on the command line, the default - "/tftpboot/%s" will be used. + If the `nfsroot' parameter is NOT given on the command line, + the default "/tftpboot/%s" will be used. - <server-ip> Specifies the IP address of the NFS server. If this field - is not given, the default address as determined by the - `ip' variable (see below) is used. One use of this - parameter is for example to allow using different servers - for RARP and NFS. Usually you can leave this blank. + <server-ip> Specifies the IP address of the NFS server. + The default address is determined by the `ip' parameter + (see below). This parameter allows the use of different + servers for IP autoconfiguration and NFS. - <root-dir> Name of the directory on the server to mount as root. If - there is a "%s" token in the string, the token will be - replaced by the ASCII-representation of the client's IP - address. + <root-dir> Name of the directory on the server to mount as root. + If there is a "%s" token in the string, it will be + replaced by the ASCII-representation of the client's + IP address. <nfs-options> Standard NFS options. All options are separated by commas. - If the options field is not given, the following defaults - will be used: + The following defaults are used: port = as given by server portmap daemon rsize = 1024 wsize = 1024 @@ -81,129 +81,174 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf> This parameter tells the kernel how to configure IP addresses of devices - and also how to set up the IP routing table. It was originally called `nfsaddrs', - but now the boot-time IP configuration works independently of NFS, so it - was renamed to `ip' and the old name remained as an alias for compatibility - reasons. + and also how to set up the IP routing table. It was originally called + `nfsaddrs', but now the boot-time IP configuration works independently of + NFS, so it was renamed to `ip' and the old name remained as an alias for + compatibility reasons. If this parameter is missing from the kernel command line, all fields are assumed to be empty, and the defaults mentioned below apply. In general - this means that the kernel tries to configure everything using both - RARP and BOOTP (depending on what has been enabled during kernel confi- - guration, and if both what protocol answer got in first). + this means that the kernel tries to configure everything using + autoconfiguration. + + The <autoconf> parameter can appear alone as the value to the `ip' + parameter (without all the ':' characters before) in which case auto- + configuration is used. + + <client-ip> IP address of the client. - <client-ip> IP address of the client. If empty, the address will either - be determined by RARP or BOOTP. What protocol is used de- - pends on what has been enabled during kernel configuration - and on the <autoconf> parameter. If this parameter is not - empty, neither RARP nor BOOTP will be used. + Default: Determined using autoconfiguration. <server-ip> IP address of the NFS server. If RARP is used to determine the client address and this parameter is NOT empty only - replies from the specified server are accepted. To use - different RARP and NFS server, specify your RARP server - here (or leave it blank), and specify your NFS server in - the `nfsroot' parameter (see above). If this entry is blank - the address of the server is used which answered the RARP - or BOOTP request. - - <gw-ip> IP address of a gateway if the server is on a different - subnet. If this entry is empty no gateway is used and the - server is assumed to be on the local network, unless a - value has been received by BOOTP. - - <netmask> Netmask for local network interface. If this is empty, + replies from the specified server are accepted. + + Only required for for NFS root. That is autoconfiguration + will not be triggered if it is missing and NFS root is not + in operation. + + Default: Determined using autoconfiguration. + The address of the autoconfiguration server is used. + + <gw-ip> IP address of a gateway if the server is on a different subnet. + + Default: Determined using autoconfiguration. + + <netmask> Netmask for local network interface. If unspecified the netmask is derived from the client IP address assuming - classful addressing, unless overridden in BOOTP reply. + classful addressing. - <hostname> Name of the client. If empty, the client IP address is - used in ASCII notation, or the value received by BOOTP. + Default: Determined using autoconfiguration. - <device> Name of network device to use. If this is empty, all - devices are used for RARP and BOOTP requests, and the - first one we receive a reply on is configured. If you have - only one device, you can safely leave this blank. + <hostname> Name of the client. May be supplied by autoconfiguration, + but its absence will not trigger autoconfiguration. - <autoconf> Method to use for autoconfiguration. If this is either - 'rarp' or 'bootp', the specified protocol is used. - If the value is 'both' or empty, both protocols are used - so far as they have been enabled during kernel configura- - tion. 'off' means no autoconfiguration. + Default: Client IP address is used in ASCII notation. - The <autoconf> parameter can appear alone as the value to the `ip' - parameter (without all the ':' characters before) in which case auto- - configuration is used. + <device> Name of network device to use. + + Default: If the host only has one device, it is used. + Otherwise the device is determined using + autoconfiguration. This is done by sending + autoconfiguration requests out of all devices, + and using the device that received the first reply. + <autoconf> Method to use for autoconfiguration. In the case of options + which specify multiple autoconfiguration protocols, + requests are sent using all protocols, and the first one + to reply is used. + Only autoconfiguration protocols that have been compiled + into the kernel will be used, regardless of the value of + this option. + off or none: don't use autoconfiguration (default) + on or any: use any protocol available in the kernel + dhcp: use DHCP + bootp: use BOOTP + rarp: use RARP + both: use both BOOTP and RARP but not DHCP + (old option kept for backwards compatibility) -3.) Kernel loader - ------------- + Default: any -To get the kernel into memory different approaches can be used. They -depend on what facilities are available: -3.1) Writing the kernel onto a floppy using dd: - As always you can just write the kernel onto a floppy using dd, - but then it's not possible to use kernel command lines at all. - To substitute the 'root=' parameter, create a dummy device on any - linux system with major number 0 and minor number 255 using mknod: - mknod /dev/boot255 c 0 255 +3.) Boot Loader + ---------- - Then copy the kernel zImage file onto a floppy using dd: +To get the kernel into memory different approaches can be used. +They depend on various facilities being available: - dd if=/usr/src/linux/arch/i386/boot/zImage of=/dev/fd0 - And finally use rdev to set the root device: +3.1) Booting from a floppy using syslinux - rdev /dev/fd0 /dev/boot255 + When building kernels, an easy way to create a boot floppy that uses + syslinux is to use the zdisk or bzdisk make targets which use + and bzimage images respectively. Both targets accept the + FDARGS parameter which can be used to set the kernel command line. - You can then remove the dummy device /dev/boot255 again. There - is no real device available for it. - The other two kernel command line parameters cannot be substi- - tuted with rdev. Therefore, using this method the kernel will - by default use RARP and/or BOOTP, and if it gets an answer via - RARP will mount the directory /tftpboot/<client-ip>/ as its - root. If it got a BOOTP answer the directory name in that answer - is used. + e.g. + make bzdisk FDARGS="root=/dev/nfs" + + Note that the user running this command will need to have + access to the floppy drive device, /dev/fd0 + + For more information on syslinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + N.B: Previously it was possible to write a kernel directly to + a floppy using dd, configure the boot device using rdev, and + boot using the resulting floppy. Linux no longer supports this + method of booting. + +3.2) Booting from a cdrom using isolinux + + When building kernels, an easy way to create a bootable cdrom that + uses isolinux is to use the isoimage target which uses a bzimage + image. Like zdisk and bzdisk, this target accepts the FDARGS + parameter which can be used to set the kernel command line. + + e.g. + make isoimage FDARGS="root=/dev/nfs" + + The resulting iso image will be arch/<ARCH>/boot/image.iso + This can be written to a cdrom using a variety of tools including + cdrecord. + + e.g. + cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ 3.2) Using LILO - When using LILO you can specify all necessary command line - parameters with the 'append=' command in the LILO configuration - file. However, to use the 'root=' command you also need to - set up a dummy device as described in 3.1 above. For how to use - LILO and its 'append=' command please refer to the LILO - documentation. + When using LILO all the necessary command line parameters may be + specified using the 'append=' directive in the LILO configuration + file. + + However, to use the 'root=' directive you also need to create + a dummy root device, which may be removed after LILO is run. + + mknod /dev/boot255 c 0 255 + + For information on configuring LILO, please refer to its documentation. 3.3) Using GRUB - When you use GRUB, you simply append the parameters after the kernel - specification: "kernel <kernel> <parameters>" (without the quotes). + When using GRUB, kernel parameter are simply appended after the kernel + specification: kernel <kernel> <parameters> 3.4) Using loadlin - When you want to boot Linux from a DOS command prompt without - having a local hard disk to mount as root, you can use loadlin. - I was told that it works, but haven't used it myself yet. In - general you should be able to create a kernel command line simi- - lar to how LILO is doing it. Please refer to the loadlin docu- - mentation for further information. + loadlin may be used to boot Linux from a DOS command prompt without + requiring a local hard disk to mount as root. This has not been + thoroughly tested by the authors of this document, but in general + it should be possible configure the kernel command line similarly + to the configuration of LILO. + + Please refer to the loadlin documentation for further information. 3.5) Using a boot ROM - This is probably the most elegant way of booting a diskless - client. With a boot ROM the kernel gets loaded using the TFTP - protocol. As far as I know, no commercial boot ROMs yet - support booting Linux over the network, but there are two - free implementations of a boot ROM available on sunsite.unc.edu - and its mirrors. They are called 'netboot-nfs' and 'etherboot'. - Both contain everything you need to boot a diskless Linux client. + This is probably the most elegant way of booting a diskless client. + With a boot ROM the kernel is loaded using the TFTP protocol. The + authors of this document are not aware of any no commercial boot + ROMs that support booting Linux over the network. However, there + are two free implementations of a boot ROM, netboot-nfs and + etherboot, both of which are available on sunsite.unc.edu, and both + of which contain everything you need to boot a diskless Linux client. 3.6) Using pxelinux - Using pxelinux you specify the kernel you built with + Pxelinux may be used to boot linux using the PXE boot loader + which is present on many modern network cards. + + When using pxelinux, the kernel image is specified using "kernel <relative-path-below /tftpboot>". The nfsroot parameters are passed to the kernel by adding them to the "append" line. - You may perhaps also want to fine tune the console output, - see Documentation/serial-console.txt for serial console help. + It is common to use serial console in conjunction with pxeliunx, + see Documentation/serial-console.txt for more information. + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index 3c62e66e1fcc..5c0ba235f5a5 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -1136,10 +1136,10 @@ Sense and level information should be encoded as follows: Devices connected to openPIC-compatible controllers should encode sense and polarity as follows: - 0 = high to low edge sensitive type enabled + 0 = low to high edge sensitive type enabled 1 = active low level sensitive type enabled - 2 = low to high edge sensitive type enabled - 3 = active high level sensitive type enabled + 2 = active high level sensitive type enabled + 3 = high to low edge sensitive type enabled ISA PIC interrupt controllers should adhere to the ISA PIC encodings listed below: @@ -1196,7 +1196,7 @@ platforms are moved over to use the flattened-device-tree model. - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC" - compatible : Should be "gianfar" - reg : Offset and length of the register set for the device - - address : List of bytes representing the ethernet address of + - mac-address : List of bytes representing the ethernet address of this controller - interrupts : <a b> where a is the interrupt number and b is a field that represents an encoding of the sense and level @@ -1216,7 +1216,7 @@ platforms are moved over to use the flattened-device-tree model. model = "TSEC"; compatible = "gianfar"; reg = <24000 1000>; - address = [ 00 E0 0C 00 73 00 ]; + mac-address = [ 00 E0 0C 00 73 00 ]; interrupts = <d 3 e 3 12 3>; interrupt-parent = <40000>; phy-handle = <2452000> @@ -1498,7 +1498,7 @@ not necessary as they are usually the same as the root node. model = "TSEC"; compatible = "gianfar"; reg = <24000 1000>; - address = [ 00 E0 0C 00 73 00 ]; + mac-address = [ 00 E0 0C 00 73 00 ]; interrupts = <d 3 e 3 12 3>; interrupt-parent = <40000>; phy-handle = <2452000>; @@ -1511,7 +1511,7 @@ not necessary as they are usually the same as the root node. model = "TSEC"; compatible = "gianfar"; reg = <25000 1000>; - address = [ 00 E0 0C 00 73 01 ]; + mac-address = [ 00 E0 0C 00 73 01 ]; interrupts = <13 3 14 3 18 3>; interrupt-parent = <40000>; phy-handle = <2452001>; @@ -1524,7 +1524,7 @@ not necessary as they are usually the same as the root node. model = "FEC"; compatible = "gianfar"; reg = <26000 1000>; - address = [ 00 E0 0C 00 73 02 ]; + mac-address = [ 00 E0 0C 00 73 02 ]; interrupts = <19 3>; interrupt-parent = <40000>; phy-handle = <2452002>; diff --git a/Documentation/ramdisk.txt b/Documentation/ramdisk.txt index 7c25584e082c..52f75b7d51c2 100644 --- a/Documentation/ramdisk.txt +++ b/Documentation/ramdisk.txt @@ -6,7 +6,7 @@ Contents: 1) Overview 2) Kernel Command Line Parameters 3) Using "rdev -r" - 4) An Example of Creating a Compressed RAM Disk + 4) An Example of Creating a Compressed RAM Disk 1) Overview @@ -34,7 +34,7 @@ make it clearer. The original "ramdisk=<ram_size>" has been kept around for compatibility reasons, but it may be removed in the future. The new RAM disk also has the ability to load compressed RAM disk images, -allowing one to squeeze more programs onto an average installation or +allowing one to squeeze more programs onto an average installation or rescue floppy disk. @@ -51,7 +51,7 @@ default is 4096 (4 MB) (8192 (8 MB) on S390). =================== This parameter tells the RAM disk driver how many bytes to use per block. The -default is 512. +default is 1024 (BLOCK_SIZE). 3) Using "rdev -r" @@ -70,7 +70,7 @@ These numbers are no magical secrets, as seen below: ./arch/i386/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000 ./arch/i386/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000 -Consider a typical two floppy disk setup, where you will have the +Consider a typical two floppy disk setup, where you will have the kernel on disk one, and have already put a RAM disk image onto disk #2. Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk @@ -97,12 +97,12 @@ Since the default start = 0 and the default prompt = 1, you could use: append = "load_ramdisk=1" -4) An Example of Creating a Compressed RAM Disk +4) An Example of Creating a Compressed RAM Disk ---------------------------------------------- To create a RAM disk image, you will need a spare block device to construct it on. This can be the RAM disk device itself, or an -unused disk partition (such as an unmounted swap partition). For this +unused disk partition (such as an unmounted swap partition). For this example, we will use the RAM disk device, "/dev/ram0". Note: This technique should not be done on a machine with less than 8 MB diff --git a/Documentation/scsi/ChangeLog.megaraid b/Documentation/scsi/ChangeLog.megaraid index c173806c91fa..a056bbe67c7e 100644 --- a/Documentation/scsi/ChangeLog.megaraid +++ b/Documentation/scsi/ChangeLog.megaraid @@ -1,3 +1,126 @@ +Release Date : Fri May 19 09:31:45 EST 2006 - Seokmann Ju <sju@lsil.com> +Current Version : 2.20.4.9 (scsi module), 2.20.2.6 (cmm module) +Older Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module) + +1. Fixed a bug in megaraid_init_mbox(). + Customer reported "garbage in file on x86_64 platform". + Root Cause: the driver registered controllers as 64-bit DMA capable + for those which are not support it. + Fix: Made change in the function inserting identification machanism + identifying 64-bit DMA capable controllers. + + > -----Original Message----- + > From: Vasily Averin [mailto:vvs@sw.ru] + > Sent: Thursday, May 04, 2006 2:49 PM + > To: linux-scsi@vger.kernel.org; Kolli, Neela; Mukker, Atul; + > Ju, Seokmann; Bagalkote, Sreenivas; + > James.Bottomley@SteelEye.com; devel@openvz.org + > Subject: megaraid_mbox: garbage in file + > + > Hello all, + > + > I've investigated customers claim on the unstable work of + > their node and found a + > strange effect: reading from some files leads to the + > "attempt to access beyond end of device" messages. + > + > I've checked filesystem, memory on the node, motherboard BIOS + > version, but it + > does not help and issue still has been reproduced by simple + > file reading. + > + > Reproducer is simple: + > + > echo 0xffffffff >/proc/sys/dev/scsi/logging_level ; + > cat /vz/private/101/root/etc/ld.so.cache >/tmp/ttt ; + > echo 0 >/proc/sys/dev/scsi/logging + > + > It leads to the following messages in dmesg + > + > sd_init_command: disk=sda, block=871769260, count=26 + > sda : block=871769260 + > sda : reading 26/26 512 byte blocks. + > scsi_add_timer: scmd: f79ed980, time: 7500, (c02b1420) + > sd 0:1:0:0: send 0xf79ed980 sd 0:1:0:0: + > command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00 + > buffer = 0xf7cfb540, bufflen = 13312, done = 0xc0366b40, + > queuecommand 0xc0344010 + > leaving scsi_dispatch_cmnd() + > scsi_delete_timer: scmd: f79ed980, rtn: 1 + > sd 0:1:0:0: done 0xf79ed980 SUCCESS 0 sd 0:1:0:0: + > command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00 + > scsi host busy 1 failed 0 + > sd 0:1:0:0: Notifying upper driver of completion (result 0) + > sd_rw_intr: sda: res=0x0 + > 26 sectors total, 13312 bytes done. + > use_sg is 4 + > attempt to access beyond end of device + > sda6: rw=0, want=1044134458, limit=951401367 + > Buffer I/O error on device sda6, logical block 522067228 + > attempt to access beyond end of device + +2. When INQUIRY with EVPD bit set issued to the MegaRAID controller, + system memory gets corrupted. + Root Cause: MegaRAID F/W handle the INQUIRY with EVPD bit set + incorrectly. + Fix: MegaRAID F/W has fixed the problem and being process of release, + soon. Meanwhile, driver will filter out the request. + +3. One of member in the data structure of the driver leads unaligne + issue on 64-bit platform. + Customer reporeted "kernel unaligned access addrss" issue when + application communicates with MegaRAID HBA driver. + Root Cause: in uioc_t structure, one of member had misaligned and it + led system to display the error message. + Fix: A patch submitted to community from following folk. + + > -----Original Message----- + > From: linux-scsi-owner@vger.kernel.org + > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Sakurai Hiroomi + > Sent: Wednesday, July 12, 2006 4:20 AM + > To: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org + > Subject: Re: Help: strange messages from kernel on IA64 platform + > + > Hi, + > + > I saw same message. + > + > When GAM(Global Array Manager) is started, The following + > message output. + > kernel: kernel unaligned access to 0xe0000001fe1080d4, + > ip=0xa000000200053371 + > + > The uioc structure used by ioctl is defined by packed, + > the allignment of each member are disturbed. + > In a 64 bit structure, the allignment of member doesn't fit 64 bit + > boundary. this causes this messages. + > In a 32 bit structure, we don't see the message because the allinment + > of member fit 32 bit boundary even if packed is specified. + > + > patch + > I Add 32 bit dummy member to fit 64 bit boundary. I tested. + > We confirmed this patch fix the problem by IA64 server. + > + > ************************************************************** + > **************** + > --- linux-2.6.9/drivers/scsi/megaraid/megaraid_ioctl.h.orig + > 2006-04-03 17:13:03.000000000 +0900 + > +++ linux-2.6.9/drivers/scsi/megaraid/megaraid_ioctl.h + > 2006-04-03 17:14:09.000000000 +0900 + > @@ -132,6 +132,10 @@ + > /* Driver Data: */ + > void __user * user_data; + > uint32_t user_data_len; + > + + > + /* 64bit alignment */ + > + uint32_t pad_0xBC; + > + + > mraid_passthru_t __user *user_pthru; + > + > mraid_passthru_t *pthru32; + > ************************************************************** + > **************** + Release Date : Mon Apr 11 12:27:22 EST 2006 - Seokmann Ju <sju@lsil.com> Current Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module) Older Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module) diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl index 69866d5997a4..b8dc51ca776c 100644 --- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl +++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl @@ -1172,7 +1172,7 @@ } /* PCI IDs */ - static struct pci_device_id snd_mychip_ids[] __devinitdata = { + static struct pci_device_id snd_mychip_ids[] = { { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, .... @@ -1565,7 +1565,7 @@ <informalexample> <programlisting> <![CDATA[ - static struct pci_device_id snd_mychip_ids[] __devinitdata = { + static struct pci_device_id snd_mychip_ids[] = { { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, .... diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 0b62c62142cf..5c3a51905969 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -25,6 +25,7 @@ Currently, these files are in /proc/sys/fs: - inode-state - overflowuid - overflowgid +- suid_dumpable - super-max - super-nr @@ -131,6 +132,25 @@ The default is 65534. ============================================================== +suid_dumpable: + +This value can be used to query and set the core dump mode for setuid +or otherwise protected/tainted binaries. The modes are + +0 - (default) - traditional behaviour. Any process which has changed + privilege levels or is execute only will not be dumped +1 - (debug) - all processes dump core when possible. The core dump is + owned by the current user and no security is applied. This is + intended for system debugging situations only. Ptrace is unchecked. +2 - (suidsafe) - any binary which normally would not be dumped is dumped + readable by root only. This allows the end user to remove + such a dump but not access it directly. For security reasons + core dumps in this mode will not overwrite one another or + other files. This mode is appropriate when adminstrators are + attempting to debug problems in a normal environment. + +============================================================== + super-max & super-nr: These numbers control the maximum number of superblocks, and diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index b0c7ab93dcb9..89bf8c20a586 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -50,7 +50,6 @@ show up in /proc/sys/kernel: - shmmax [ sysv ipc ] - shmmni - stop-a [ SPARC only ] -- suid_dumpable - sysrq ==> Documentation/sysrq.txt - tainted - threads-max @@ -211,9 +210,8 @@ Controls the kernel's behaviour when an oops or BUG is encountered. 0: try to continue operation -1: delay a few seconds (to give klogd time to record the oops output) and - then panic. If the `panic' sysctl is also non-zero then the machine will - be rebooted. +1: panic immediatly. If the `panic' sysctl is also non-zero then the + machine will be rebooted. ============================================================== @@ -311,25 +309,6 @@ kernel. This value defaults to SHMMAX. ============================================================== -suid_dumpable: - -This value can be used to query and set the core dump mode for setuid -or otherwise protected/tainted binaries. The modes are - -0 - (default) - traditional behaviour. Any process which has changed - privilege levels or is execute only will not be dumped -1 - (debug) - all processes dump core when possible. The core dump is - owned by the current user and no security is applied. This is - intended for system debugging situations only. Ptrace is unchecked. -2 - (suidsafe) - any binary which normally would not be dumped is dumped - readable by root only. This allows the end user to remove - such a dump but not access it directly. For security reasons - core dumps in this mode will not overwrite one another or - other files. This mode is appropriate when adminstrators are - attempting to debug problems in a normal environment. - -============================================================== - tainted: Non-zero if the kernel has been tainted. Numeric values, which diff --git a/Documentation/usb/proc_usb_info.txt b/Documentation/usb/proc_usb_info.txt index f86550fe38ee..22c5331260ca 100644 --- a/Documentation/usb/proc_usb_info.txt +++ b/Documentation/usb/proc_usb_info.txt @@ -59,7 +59,7 @@ bind to an interface (or perhaps several) using an ioctl call. You would issue more ioctls to the device to communicate to it using control, bulk, or other kinds of USB transfers. The IOCTLs are listed in the <linux/usbdevice_fs.h> file, and at this writing the -source code (linux/drivers/usb/devio.c) is the primary reference +source code (linux/drivers/usb/core/devio.c) is the primary reference for how to access devices through those files. Note that since by default these BBB/DDD files are writable only by diff --git a/Documentation/usb/usb-help.txt b/Documentation/usb/usb-help.txt index b7c324973695..a7408593829f 100644 --- a/Documentation/usb/usb-help.txt +++ b/Documentation/usb/usb-help.txt @@ -5,8 +5,7 @@ For USB help other than the readme files that are located in Documentation/usb/*, see the following: Linux-USB project: http://www.linux-usb.org - mirrors at http://www.suse.cz/development/linux-usb/ - and http://usb.in.tum.de/linux-usb/ + mirrors at http://usb.in.tum.de/linux-usb/ and http://it.linux-usb.org Linux USB Guide: http://linux-usb.sourceforge.net Linux-USB device overview (working devices and drivers): diff --git a/Documentation/usb/usb-serial.txt b/Documentation/usb/usb-serial.txt index f001cd93b79b..02b0f7beb6d1 100644 --- a/Documentation/usb/usb-serial.txt +++ b/Documentation/usb/usb-serial.txt @@ -399,10 +399,10 @@ REINER SCT cyberJack pinpad/e-com USB chipcard reader Prolific PL2303 Driver - This driver support any device that has the PL2303 chip from Prolific + This driver supports any device that has the PL2303 chip from Prolific in it. This includes a number of single port USB to serial converters and USB GPS devices. Devices from Aten (the UC-232) and - IO-Data work with this driver. + IO-Data work with this driver, as does the DCU-11 mobile-phone cable. For any questions or problems with this driver, please contact Greg Kroah-Hartman at greg@kroah.com diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index 6887d44d2661..6da24e7a56cb 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt @@ -238,6 +238,13 @@ Debugging pagefaulttrace Dump all page faults. Only useful for extreme debugging and will create a lot of output. + call_trace=[old|both|newfallback|new] + old: use old inexact backtracer + new: use new exact dwarf2 unwinder + both: print entries from both + newfallback: use new unwinder but fall back to old if it gets + stuck (default) + Misc noreplacement Don't replace instructions with more appropriate ones |