summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] x86_64: Calgary IOMMU - Multi-Node NULL pointer dereference fixJon Mason2006-07-291-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calgary hits a NULL pointer dereference when booting in a multi-chassis NUMA system. See Redhat bugzilla number 198498, found by Konrad Rzeszutek (konradr@redhat.com). There are many issues that had to be resolved to fix this problem. Firstly when I originally wrote the code to handle NUMA systems, I had a large misunderstanding that was not corrected until now. That was that I thought the "number of nodes online" referred to number of physical systems connected. So that if NUMA was disabled, there would only be 1 node and it would only show that node's PCI bus. In reality if NUMA is disabled, the system displays all of the connected chassis as one node but is only ignorant of the delays in accessing main memory. Therefore, references to num_online_nodes() and MAX_NUMNODES are incorrect and need to be set to the maximum number of nodes that can be accessed (which are 8). I created a variable, MAX_NUM_CHASSIS, and set it to 8 to fix this. Secondly, when walking the PCI in detect_calgary, the code only checked the first "slot" when looking to see if a device is present. This will work for most cases, but unfortunately it isn't always the case. In the NUMA MXE drawers, there are USB devices present on the 3rd slot (with slot 1 being empty). So, to work around this, all slots (up to 8) are scanned to see if there are any devices present. Lastly, the bus is being enumerated on large systems in a different way the we originally thought. This throws the ugly logic we had out the window. To more elegantly handle this, I reorganized the kva array to be sparse (which removed the need to have any bus number to kva slot logic in tce.c) and created a secondary space array to contain the bus number to phb mapping. With these changes Calgary boots on an x460 with 4 nodes with and without NUMA enabled. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'master' into upstream-fixesJeff Garzik2006-07-2911-26/+24
|\
| * [PATCH] pi-futex: robust-futex exitIngo Molnar2006-07-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix robust PI-futexes to be properly unlocked on unexpected exit. For this to work the kernel has to know whether a futex is a PI or a non-PI one, because the semantics are different. Since the space in relevant glibc data structures is extremely scarce, the best solution is to encode the 'PI' information in bit 0 of the robust list pointer. Existing (non-PI) glibc robust futexes have this bit always zero, so the ABI is kept. New glibc with PI-robust-futexes will set this bit. Further fixes from Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] x86_64: Enlarge debug stack for nested kprobesbibo mao2006-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In x86_64 platform, INT1 and INT3 trap stack is IST stack called DEBUG_STACK, when INT1/INT3 trap happens, system will switch to DEBUG_STACK by hardware. Current DEBUG_STACK size is 4K, when int1/int3 trap happens, kernel will minus current DEBUG_STACK IST value by 4k. But if int3/int1 trap is nested, it will destroy other vector's IST stack. This patch modifies this, it sets DEBUG_STACK size as 8K and allows two level of nested int1/int3 trap. Kprobe DEBUG_STACK may be nested, because kprobe handler may be probed by other kprobes. Thanks jbeulich for pointing out error in the first patch. [AK: nested kprobes are pretty dubious. Hopefully one nest will be enough. This will cost 8K per CPU (4K more than before)] Signed-off-by: bibo, mao <bibo.mao@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds2006-07-282-2/+2
| |\ | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SUNLANCE]: fix compilation on sparc-UP [SPARC]: Defer clock_probe to fs_initcall() [SPARC64]: Fix typo in pgprot_noncached(). [SPARC64]: Fix quad-float multiply emulation.
| | * [SPARC64]: Fix typo in pgprot_noncached().David S. Miller2006-07-271-1/+1
| | | | | | | | | | | | | | | | | | The sun4v code sequence was or'ing in the sun4u pte bits by mistake. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * [SPARC64]: Fix quad-float multiply emulation.David S. Miller2006-07-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Something is wrong with the 3-multiply (vs. 4-multiply) optimized version of _FP_MUL_MEAT_2_*(), so just use the slower version which actually computes correct values. Noticed by Rene Rebe Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [PATCH] ide: option to disable cache flushes for buggy drivesJens Axboe2006-07-281-0/+1
| |/ | | | | | | | | | | | | | | Some drives claim they support cache flushing, but get seriously confused if you try. Add this option to be able to boot with barriers enabled by default. Signed-off-by: Jens Axboe <axboe@suse.de>
| * Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds2006-07-261-1/+1
| |\ | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SCSI] esp: Fix build. [SPARC]: Fix SA_STATIC_ALLOC value. [SPARC64]: Explicitly print return PC when the kernel fault PC is bogus.
| | * [SPARC]: Fix SA_STATIC_ALLOC value.David S. Miller2006-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | It alises IRQF_SHARED which causes all kinds of problems. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2006-07-261-0/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [IPV4/IPV6]: Setting 0 for unused port field in RAW IP recvmsg(). [IPV4] ipmr: ip multicast route bug fix. [TG3]: Update version and reldate [TG3]: Handle tg3_init_rings() failures [TG3]: Add tg3_restart_hw() [IPV4]: Clear the whole IPCB, this clears also IPCB(skb)->flags. [IPV6]: Clean skb cb on IPv6 input. [NETFILTER]: Demote xt_sctp to EXPERIMENTAL [NETFILTER]: bridge netfilter: add deferred output hooks to feature-removal-schedule [NETFILTER]: xt_pkttype: fix mismatches on locally generated packets [NETFILTER]: SNMP NAT: fix byteorder confusion [NETFILTER]: conntrack: fix SYSCTL=n compile [NETFILTER]: nf_queue: handle NF_STOP and unknown verdicts in nf_reinject [NETFILTER]: H.323 helper: fix possible NULL-ptr dereference
| | * | [NETFILTER]: bridge netfilter: add deferred output hooks to ↵Patrick McHardy2006-07-241-0/+2
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | feature-removal-schedule Add bridge netfilter deferred output hooks to feature-removal-schedule and disable them by default. Until their removal they will be activated by the physdev match when needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * / [PATCH] Reorganize the cpufreq cpu hotplug locking to not be totally bizareArjan van de Ven2006-07-261-3/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch below moves the cpu hotplugging higher up in the cpufreq layering; this is needed to avoid recursive taking of the cpu hotplug lock and to otherwise detangle the mess. The new rules are: 1. you must do lock_cpu_hotplug() around the following functions: __cpufreq_driver_target __cpufreq_governor (for CPUFREQ_GOV_LIMITS operation only) __cpufreq_set_policy 2. governer methods (.governer) must NOT take the lock_cpu_hotplug() lock in any way; they are called with the lock taken already 3. if your governer spawns a thread that does things, like calling __cpufreq_driver_target, your thread must honor rule #1. 4. the policy lock and other cpufreq internal locks nest within the lock_cpu_hotplug() lock. I'm not entirely happy about how the __cpufreq_governor rule ended up (conditional locking rule depending on the argument) but basically all callers pass this as a constant so it's not too horrible. The patch also removes the cpufreq_governor() function since during the locking audit it turned out to be entirely unused (so no need to fix it) The patch works on my testbox, but it could use more testing (otoh... it can't be much worse than the current code) Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [NET]: Correct dev_alloc_skb kerneldocChristoph Hellwig2006-07-241-2/+2
| | | | | | | | | | | | | | | | dev_alloc_skb is designated for RX descriptors, not TX. (Some drivers use it for the latter anyway, but that's a different story) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Remove CONFIG_HAVE_ARCH_DEV_ALLOC_SKBChristoph Hellwig2006-07-241-4/+0
| | | | | | | | | | | | | | | | | | | | skbuff.h has an #ifndef CONFIG_HAVE_ARCH_DEV_ALLOC_SKB to allow architectures to reimplement __dev_alloc_skb. It's not set on any architecture and now that we have an architecture-overrideable NET_SKB_PAD there is not point at all to have one either. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [PKT_SCHED]: Fix regression in PSCHED_TADD{,2}.Guillaume Chazarain2006-07-241-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In PSCHED_TADD and PSCHED_TADD2, if delta is less than tv.tv_usec (so, less than USEC_PER_SEC too) then tv_res will be smaller than tv. The affectation "(tv_res).tv_usec = __delta;" is wrong. The fix is to revert to the original code before 4ee303dfeac6451b402e3d8512723d3a0f861857 and change the 'if' in 'while'. [Shuya MAEDA: "while (__delta >= USEC_PER_SEC){ ... }" instead of "while (__delta > USEC_PER_SEC){ ... }"] Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * IB/mad: Validate MADs for spec complianceSean Hefty2006-07-241-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Validate MADs sent by userspace clients for spec compliance with C13-18.1.1 (prevent duplicate requests and responses sent on the same port). Without this, RMPP transactions get aborted because of duplicate packets. This patch is similar to that provided by Jack Morgenstein. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | Merge branch 'master' into upstream-fixesJeff Garzik2006-07-2413-45/+35
|\|
| * cpu hotplug: simplify and hopefully fix lockingLinus Torvalds2006-07-231-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CPU hotplug locking was quite messy, with a recursive lock to handle the fact that both the actual up/down sequence wanted to protect itself from being re-entered, but the callbacks that it called also tended to want to protect themselves from CPU events. This splits the lock into two (one to serialize the whole hotplug sequence, the other to protect against the CPU present bitmaps changing). The latter still allows recursive usage because some subsystems (ondemand policy for cpufreq at least) had already gotten too used to the lax locking, but the locking mistakes are hopefully now less fundamental, and we now warn about recursive lock usage when we see it, in the hope that it can be fixed. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2006-07-214-9/+19
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) [TIPC]: Removing useless casts [IPV4]: Fix nexthop realm dumping for multipath routes [DUMMY]: Avoid an oops when dummy_init_one() failed [IFB] After ifb_init_one() failed, i is increased. Decrease [NET]: Fix reversed error test in netif_tx_trylock [MAINTAINERS]: Mark LAPB as Oprhan. [NET]: Conversions from kmalloc+memset to k(z|c)alloc. [NET]: sun happymeal, little pci cleanup [IrDA]: Use alloc_skb() in IrDA TX path [I/OAT]: Remove pci_module_init() from Intel I/OAT DMA engine [I/OAT]: net/core/user_dma.c should #include <net/netdma.h> [SCTP]: ADDIP: Don't use an address as source until it is ASCONF-ACKed [SCTP]: Set chunk->data_accepted only if we are going to accept it. [SCTP]: Verify all the paths to a peer via heartbeat before using them. [SCTP]: Unhash the endpoint in sctp_endpoint_free(). [SCTP]: Check for NULL arg to sctp_bucket_destroy(). [PKT_SCHED] netem: Fix slab corruption with netem (2nd try) [WAN]: Converted synclink drivers to use netif_carrier_*() [WAN]: Cosmetic changes to N2 and C101 drivers [WAN]: Added missing netif_dormant_off() to generic HDLC ...
| | * [NET]: Fix reversed error test in netif_tx_trylockHerbert Xu2006-07-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | A non-zero return value indicates success from spin_trylock, not error. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * [I/OAT]: net/core/user_dma.c should #include <net/netdma.h>Adrian Bunk2006-07-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every file should #include the headers containing the prototypes for its global functions. Especially in cases like this one where gcc can tell us through a compile error that the prototype was wrong... Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * [SCTP]: ADDIP: Don't use an address as source until it is ASCONF-ACKedSridhar Samudrala2006-07-211-5/+2
| | | | | | | | | | | | | | | | | | | | | This implements Rules D1 and D4 of Sec 4.3 in the ADDIP draft. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * [SCTP]: Verify all the paths to a peer via heartbeat before using them.Sridhar Samudrala2006-07-212-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | This patch implements Path Initialization procedure as described in Sec 2.36 of RFC4460. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [SPARC]: Kill prom_getname, unused and not implemented properly.David S. Miller2006-07-213-15/+0
| | | | | | | | | | | | | | | | | | | | | The m68k port's sun3 asm/oplib.h had a stray reference too, so I killed that off as well. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [SPARC64]: Fix more of_device layer IRQ bugs, and correct PROMREG_MAX.David S. Miller2006-07-211-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sabre and Psycho PCI controllers can have partial interrupt-map properties, meaning that on-board devices don't match up to any entries. Instead, they are fully specified from the beginning and we should pass them directly to the IRQ translator as-is. Also, fill in the necessary translator slots for the "graphics" and "expansion UPA" interrupts on Sabre, Psycho, and SYSIO SBUS. Increase PROMREG_MAX to 24, as seen on SUNW,ffb devices. Finally, prevent accidentally writing past the end of the of_device struct resource[] and irqs[] arrays. Spit out a log message when we ignore some entries because there are too many of them. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6Linus Torvalds2006-07-212-10/+6
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (38 commits) [SCSI] More buffer->request_buffer changes [SCSI] mptfusion: bump version to 3.04.01 [SCSI] mptfusion: misc fix's [SCSI] mptfusion: firmware download boot fix's [SCSI] mptfusion: task abort fix's [SCSI] mptfusion: sas nexus loss support [SCSI] mptfusion: sas loginfo update [SCSI] mptfusion: mptctl panic when loading [SCSI] mptfusion: sas enclosures with smart drive [SCSI] NCR_D700: misc fixes (section and argument ordering) [SCSI] scsi_debug: must_check fixes [SCSI] scsi_transport_sas: kill the use of channel [SCSI] scsi_transport_sas: add expander backlink [SCSI] hide EH backup data outside the scsi_cmnd [SCSI] ibmvscsi: handle inactive SCSI target during probe [SCSI] ibmvscsi: allocate lpevents for ibmvscsi on iseries [SCSI] aic7[9x]xx: Remove last vestiges of reverse_scan [SCSI] aha152x: stop poking at saved scsi_cmnd members [SCSI] st.c: Improve sense output [SCSI] lpfc 8.1.7: Change version number to 8.1.7 ...
| | * [SCSI] scsi_transport_sas: add expander backlinkJames Bottomley2006-07-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to add a backlink to a particular port. The idea is to represent properly ports on expanders that are used specifically for linking to the parent device in the topology. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
| | * [SCSI] hide EH backup data outside the scsi_cmndChristoph Hellwig2006-07-091-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently struct scsi_cmnd has various fields that are used to backup original data after the corresponding fields have been overridden for EH commands. This means drivers can easily get at it and misuse it. Due to the old_ naming this doesn't happen for most of them, but two that have different names have been used wrong a lot (see previous patch). Another downside is that they unessecarily bloat the scsi_cmnd size. This patch moves them onstack in scsi_send_eh_cmnd to fix those two issues aswell as allowing future EH fixes like moving the EH command submissions to use SG lists like everything else. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
| | * [SCSI] scsi_transport_sas: add unindexed portsJames Bottomley2006-07-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some SAS HBAs don't want to go to the trouble of tracking port numbers, so they'd simply like to say "add this port and give it a number". This is especially beneficial from the hotplug point of view, since tracking ports and the available number space can be a real pain. The current implementation uses an incrementing number per expander to add the port on. However, since there can never be more ports than there are phys, a later implementation will try to be more intelligent about this. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
| * | [S390] get_clock inline assembly.Andreas Krebbel2006-07-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Add missing volatile to the get_clock / get_cycles inline assemblies to avoid that consecutive calls get optimized away. Signed-off-by: Andreas Krebbel <krebbel1@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] Fix gcc warning about unused return values.Heiko Carstens2006-07-171-2/+7
| | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | | [PATCH] libata: improve EH action and EHI flag handlingTejun Heo2006-07-191-1/+3
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update ata_eh_about_to_do() and ata_eh_done() to improve EH action and EHI flag handling. * There are two types of EHI flags - one which expires on successful EH and the other which expires on a successful reset. Make this distinction clear. * Unlike other EH actions, reset actions are represented by two EH action masks and a EHI modifier. Implement correct about_to_do/done semantics for resets. That is, prior to reset, related EH info is sucked in from ehi and cleared, and after reset is complete, related EH info in ehc is cleared. These changes improve consistency and remove unnecessary EH actions caused by stale EH action masks and EHI flags. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* | Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2006-07-141-2/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [VLAN]: __vlan_hwaccel_rx can use the faster ether_compare_addr [PKT_SCHED] HTB: initialize upper bound properly [IPV4]: Clear skb cb on IP input [NET]: Update frag_list in pskb_trim
| * | [VLAN]: __vlan_hwaccel_rx can use the faster ether_compare_addrStephen Hemminger2006-07-141-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inline function compare_ether_addr is faster than memcmp. Also, don't need to drag in proc_fs.h, the only reference to proc_dir_entry is a pointer so the declaration is needed here. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | [PATCH] remove set_wmb - arch removalSteven Rostedt2006-07-1423-31/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_wmb should not be used in the kernel because it just confuses the code more and has no benefit. Since it is not currently used in the kernel this patch removes it so that new code does not include it. All archs define set_wmb(var, value) to do { var = value; wmb(); } while(0) except ia64 and sparc which use a mb() instead. But this is still moot since it is not used anyway. Hasn't been tested on any archs but x86 and x86_64 (and only compiled tested) Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task delay accounting taskstats interface: control exit data ↵Shailabh Nagar2006-07-142-25/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | through cpumasks On systems with a large number of cpus, with even a modest rate of tasks exiting per cpu, the volume of taskstats data sent on thread exit can overflow a userspace listener's buffers. One approach to avoiding overflow is to allow listeners to get data for a limited and specific set of cpus. By scaling the number of listeners and/or the cpus they monitor, userspace can handle the statistical data overload more gracefully. In this patch, each listener registers to listen to a specific set of cpus by specifying a cpumask. The interest is recorded per-cpu. When a task exits on a cpu, its taskstats data is unicast to each listener interested in that cpu. Thanks to Andrew Morton for pointing out the various scalability and general concerns of previous attempts and for suggesting this design. [akpm@osdl.org: build fix] Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task delay accounting: avoid send without listenersShailabh Nagar2006-07-141-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't send taskstats (per-pid or per-tgid) on thread exit when no one is listening for such data. Currently the taskstats interface allocates a structure, fills it in and calls netlink to send out per-pid and per-tgid stats regardless of whether a userspace listener for the data exists (netlink layer would check for that and avoid the multicast). As a result of this patch, the check for the no-listener case is performed early, avoiding the redundant allocation and filling up of the taskstats structures. Signed-off-by: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] delay accounting taskstats interface send tgid onceShailabh Nagar2006-07-142-16/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send per-tgid data only once during exit of a thread group instead of once with each member thread exit. Currently, when a thread exits, besides its per-tid data, the per-tgid data of its thread group is also sent out, if its thread group is non-empty. The per-tgid data sent consists of the sum of per-tid stats for all *remaining* threads of the thread group. This patch modifies this sending in two ways: - the per-tgid data is sent only when the last thread of a thread group exits. This cuts down heavily on the overhead of sending/receiving per-tgid data, especially when other exploiters of the taskstats interface aren't interested in per-tgid stats - the semantics of the per-tgid data sent are changed. Instead of being the sum of per-tid data for remaining threads, the value now sent is the true total accumalated statistics for all threads that are/were part of the thread group. The patch also addresses a minor issue where failure of one accounting subsystem to fill in the taskstats structure was causing the send of taskstats to not be sent at all. The patch has been tested for stability and run cerberus for over 4 hours on an SMP. [akpm@osdl.org: bugfixes] Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task-delay-accounting: /proc export of aggregated block I/O delaysShailabh Nagar2006-07-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export I/O delays seen by a task through /proc/<tgid>/stats for use in top etc. Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed together with all other I/O here (this is not the case in the netlink interface where the swapin I/O is kept distinct) [akpm@osdl.org: printk warning fix] Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task-delay-accounting: delay accounting usage of taskstats interfaceShailabh Nagar2006-07-144-1/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Usage of taskstats interface by delay accounting. Signed-off-by: Shailabh Nagar <nagar@us.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task-delay-accounting: taskstats interfaceShailabh Nagar2006-07-142-0/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC family), for getting statistics of tasks and thread groups during their lifetime and when they exit. The interface is intended for use by multiple accounting packages though it is being created in the context of delay accounting. This patch creates the interface without populating the fields of the data that is sent to the user in response to a command or upon the exit of a task. Each accounting package interested in using taskstats has to provide an additional patch to add its stats to the common structure. [akpm@osdl.org: cleanups, Kconfig fix] Signed-off-by: Shailabh Nagar <nagar@us.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task-delay-accounting: utilities for genetlink usageBalbir Singh2006-07-141-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two utilities for simplifying usage of NETLINK_GENERIC interface. Signed-off-by: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task-delay-accounting: cpu delay collection via schedstatsChandra Seetharaman2006-07-141-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the task-related schedstats functions callable by delay accounting even if schedstats collection isn't turned on. This removes the dependency of delay accounting on schedstats. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task-delay-accounting: sync block I/O and swapin delay collectionShailabh Nagar2006-07-142-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike earlier iterations of the delay accounting patches, now delays are only collected for the actual I/O waits rather than try and cover the delays seen in I/O submission paths. Account separately for block I/O delays incurred as a result of swapin page faults whose frequency can be affected by the task/process' rss limit. Hence swapin delays can act as feedback for rss limit changes independent of I/O priority changes. Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] per-task-delay-accounting: setupShailabh Nagar2006-07-143-0/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initialization code related to collection of per-task "delay" statistics which measure how long it had to wait for cpu, sync block io, swapping etc. The collection of statistics and the interface are in other patches. This patch sets up the data structures and allows the statistics collection to be disabled through a kernel boot parameter. Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] list_is_last utilityShailabh Nagar2006-07-141-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | Add another list utility function to check for last element in a list. Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] mbxfb: Add framebuffer driver for the Intel 2700GMike Rapoport2006-07-141-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add frame buffer driver for the 2700G LCD controller present on CompuLab CM-X270 computer module. [adaplas] - Add more informative help text to Kconfig - Make DEBUG a Kconfig option as FB_MBX_DEBUG - Remove #include mbxdebug.c, this is frowned upon - Remove redundant casts - Arrange #include's alphabetically - Trivial whitespace Signed-off-by: Mike Rapoport <mike@compulab.co.il> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] symlink nesting level changeAl Viro2006-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's way past time to bump it to 8. Everyone had been warned - for months now. RH kernels have had this for more than a year. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] gpio: drop vtable members .gpio_set_high .gpio_set_low gpio_set is ↵Jim Cromie2006-07-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enough drops gpio_set_high, gpio_set_low from the nsc_gpio_ops vtable. While we can't drop them from scx200_gpio (or can we?), we dont need them for new users of the exported vtable; gpio_set(1), gpio_set(0) work fine. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>