summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nohz: prevent tick stop outside of the idle loopThomas Gleixner2008-07-1814-17/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jack Ren and Eric Miao tracked down the following long standing problem in the NOHZ code: scheduler switch to idle task enable interrupts Window starts here ----> interrupt happens (does not set NEED_RESCHED) irq_exit() stops the tick ----> interrupt happens (does set NEED_RESCHED) return from schedule() cpu_idle(): preempt_disable(); Window ends here The interrupts can happen at any point inside the race window. The first interrupt stops the tick, the second one causes the scheduler to rerun and switch away from idle again and we end up with the tick disabled. The fact that it needs two interrupts where the first one does not set NEED_RESCHED and the second one does made the bug obscure and extremly hard to reproduce and analyse. Kudos to Jack and Eric. Solution: Limit the NOHZ functionality to the idle loop to make sure that we can not run into such a situation ever again. cpu_idle() { preempt_disable(); while(1) { tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we are in the idle loop while (!need_resched()) halt(); tick_nohz_restart_sched_tick(); <- disables NOHZ mode preempt_enable_no_resched(); schedule(); preempt_disable(); } } In hindsight we should have done this forever, but ... /me grabs a large brown paperbag. Debugged-by: Jack Ren <jack.ren@marvell.com>, Debugged-by: eric miao <eric.y.miao@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* nohz: don't stop idle tick if softirqs are pending.Heiko Carstens2008-07-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | In case a cpu goes idle but softirqs are pending only an error message is printed to the console. It may take a very long time until the pending softirqs will finally be executed. Worst case would be a hanging system. With this patch the timer tick just continues and the softirqs will be executed after the next interrupt. Still a delay but better than a hanging system. Currently we have at least two device drivers on s390 which under certain circumstances schedule a tasklet from process context. This is a reason why we can end up with pending softirqs when going idle. Fixing these drivers seems to be non-trivial. However there is no question that the drivers should be fixed. This patch shouldn't be considered as a bug fix. It just is intended to keep a system running even if device drivers are buggy. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Glauber <jan.glauber@de.ibm.com> Cc: Stefan Weinhuber <wein@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* nohz: reduce jiffies polling overheadIngo Molnar2008-05-301-0/+7
| | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds2008-05-292-0/+18
|\ | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: driver-core: prepare for 2.6.27 api change by adding dev_set_name
| * driver-core: prepare for 2.6.27 api change by adding dev_set_nameStephen Rothwell2008-05-292-0/+18
| | | | | | | | | | | | | | | | | | | | Create the dev_set_name function now so that various subsystems can start changing over to it before other changes in 2.6.27 will make it compulsory. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6Linus Torvalds2008-05-2926-96/+812
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: Revert "USB: EHCI: fix performance regression" USB: fsl_usb2_udc: fix recursive lock USB: usb-serial: option: Don't match Huawei driver CD images USB: pl2303: another product ID USB: add another scanner quirk USB: Add support for ROKR W5 in unusual_devs.h USB: Fix M600i unusual_devs entry USB: usb-storage: unusual_devs update for Cypress ATACB USB: EHCI: fix performance regression USB: EHCI: fix bug in Iso scheduling USB: EHCI: fix remote-wakeup regression USB: EHCI: suppress unwanted error messages USB: EHCI: fix up root-hub TT mess USB: add all configs to the "descriptors" attribute USB: fix possible deadlock involving sysfs attributes USB: Firmware loader driver for USB Apple iSight camera USB: FTDI_SIO : Add support for Matrix Orbital PID Range
| * | Revert "USB: EHCI: fix performance regression"Greg Kroah-Hartman2008-05-292-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit fa38dfcc56b5f6cce787f9aaa5d1830509213802. It wasn't really a regression and David and Alan are still working through the issues reported. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: fsl_usb2_udc: fix recursive lockLi Yang2008-05-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UDC needs to release lock before calling out to gadget driver, since it may need to reenter. The change fixes kernel BUG observed on rt kernel. > kernel BUG at kernel/rtmutex.c:683! > stopped custom tracer. > Oops: Exception in kernel mode, sig: 5 [#1] > PREEMPT MPC834x ITX > NIP: c021629c LR: c0216270 CTR: 00000000 > REGS: df761d70 TRAP: 0700 Not tainted (2.6.23.9-rt13) > MSR: 00021032 <ME,IR,DR> CR: 28000022 XER: 00000000 > TASK = df632080[241] 'IRQ-38' THREAD: df760000 > GPR00: 00000001 df761e20 df632080 00000000 11111111 00000000 df761e6c > 00000000 > GPR08: df761e48 00000000 df761e50 00000000 80000000 ede5cdde 1fffd000 > 00800000 > GPR16: ffffffff 00000000 007fff00 00000040 00000000 007ffeb0 00000000 > 1fff8b08 > GPR24: 00000000 00000026 00000000 df79a320 c026b2e8 c02240bc 00009032 > df79a320 > NIP [c021629c] rt_spin_lock_slowlock+0x9c/0x200 > LR [c0216270] rt_spin_lock_slowlock+0x70/0x200 > Call Trace: > [df761e20] [c0216270] rt_spin_lock_slowlock+0x70/0x200 (unreliable) > [df761e90] [c0182828] fsl_ep_disable+0xcc/0x154 > [df761eb0] [c0184d30] eth_reset_config+0x88/0x1d0 > [df761ed0] [c0184ec0] eth_disconnect+0x48/0x64 > [df761ef0] [c01831a4] reset_queues+0x60/0x78 > [df761f00] [c0183b74] fsl_udc_irq+0x9b8/0xa58 > [df761f50] [c003ef30] handle_IRQ_event+0x64/0x100 > [df761f80] [c003f758] thread_simple_irq+0x6c/0xc8 > [df761fa0] [c003f888] do_irqd+0xd4/0x2e4 > [df761fd0] [c0032284] kthread+0x50/0x8c > [df761ff0] [c000f9b4] kernel_thread+0x44/0x60 Signed-off-by: Li Yang <leoli@freescale.com> Cc: Eugene T. Bordenkircher <Eugene_Bordenkircher@selinc.com> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: usb-serial: option: Don't match Huawei driver CD imagesMichael Karcher2008-05-291-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the interface info matching to all Huawei cards, as they all also contain a Mass Storage Device interface (usually containing Windows drivers) which should not get bound by this driver. See also drivers/usb/storage/unusual_devs.h Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: pl2303: another product IDSteve Murphy2008-05-292-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've just got a USB GPRS/EDGE modem branded Manufacturer Micromax Model MMX610U (see http://www.airtel.in/level2_t3data.aspx?path=1/106/179) working by adding another product ID to pl2303. Modem info reports same module as Max Arnold's i.e.SIMCOM SIM600 but with product ID 0x0612 (cf Ox0611). From: Steve Murphy <steve@gnusis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: add another scanner quirkRené Rebe2008-05-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like the HP53{00,70} scanner other devices of the OEM Avision require the USB_QUIRK_STRING_FETCH_255 to correct set a configuration with "recent" Linux kernels. Signed-off-by: René Rebe <rene@exactcode.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: Add support for ROKR W5 in unusual_devs.hJavier Smaldone2008-05-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for rev 2 of an existing unusual_devs entry enabling ROKR W5s to work. Greg, please apply. From: Javier Smaldone <javier@smaldone.com.ar> Signed-off-by: Phil Dibowitz <phil@ipom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: Fix M600i unusual_devs entryPhil Dibowitz2008-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that the unusual_devs entry for the Motorola M600i needs another flag. This patch adds it. Thanks to Atte André Jensen <atte@ballbreaker.dk>. Signed-off-by: Phil Dibowitz <phil@ipom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: usb-storage: unusual_devs update for Cypress ATACBAlan Stern2008-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as1101) updates the unusual_devs entry for the Cypress ATACB pass-through. The protocol field is changed from US_PR_BULK to US_PR_DEVICE, since the Cypress devices already set bInterfaceProtocol to Bulk-only. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: EHCI: fix performance regressionAlan Stern2008-05-292-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as1099) fixes a performance regression in ehci-hcd. The fundamental problem is that queue headers get removed from the schedule too quickly, since the code checks for a counter advancing rather than making an actual time-based check. The latency involved in removing the queue header and then relinking it can severely degrade certain kinds of workloads. The patch replaces a simple counter with a timestamp derived from the controller's uframe value. In addition, the delay for unlinking an idle queue header is increased from 5 ms to 10 ms; since some controllers (nVidia) have a latency of up to 1 ms for unlinking, this reduces the relative impact from 20% to 10%. Finally, a logical error left over from the IAA watchdog-timer conversion is corrected. Now the driver will always either unlink an idle queue header or set up a timer to unlink it later. The old code would sometimes fail to do either. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: David Brownell <david-b@pacbell.net> Cc: Leonid <leonidv11@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: EHCI: fix bug in Iso schedulingAlan Stern2008-05-291-23/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as1098) changes the way ehci-hcd schedules its periodic Iso transfers. That the current scheduling code is wrong is clear on the face of it: Sometimes it returns -EL2NSYNC (meaning that an URB couldn't be scheduled because it was submitted too late), but it does this even when the URB_ISO_ASAP flag is set (meaning the URB should be scheduled as soon as possible). The new code properly implements as-soon-as-possible scheduling, assigning the next unexpired slot as the URB's starting point. It also is more careful about checking for Iso URB completion: It doesn't bother to check for activity during frames that are already over, and it allows for the possibility that some of the URB's packets may have raced the hardware when they were submitted and so never got used (the packet status is set to -EXDEV). This fixes problems several people have experienced with USB video applications. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: EHCI: fix remote-wakeup regressionAlan Stern2008-05-292-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as1097) fixes a bug in the remote-wakeup handling in ehci-hcd. The driver currently does not keep track of whether the change-suspend feature is enabled for each port; the feature is automatically reset the first time it is read. But recent changes to the hub driver require that the feature be read at least twice in order to work properly. A bit-vector is added for storing the change-suspend feature values. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: EHCI: suppress unwanted error messagesAlan Stern2008-05-2911-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as1096) fixes an annoying problem: When a full-speed or low-speed device is plugged into an EHCI controller, it fails to enumerate at high speed and then is handed over to the companion controller. But usbcore logs a misleading and unwanted error message when the high-speed enumeration fails. The patch adds a new HCD method, port_handed_over, which asks whether a port has been handed over to a companion controller. If it has, the error message is suppressed. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: David Brownell <david-b@pacbell.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: EHCI: fix up root-hub TT messAlan Stern2008-05-296-15/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as1095) cleans up the HCD glue and several of the EHCI bus-glue files. The ehci->is_tdi_rh_tt flag is redundant, since it means the same thing as the hcd->has_tt flag, so it is removed and the other flag used in its place. Some of the bus-glue files didn't get the relinquish_port method added to their hc_driver structures. Although that routine currently doesn't do anything for controllers with an integrated TT, in the future it might. So the patch adds it where it is missing. Lastly, some of the bus-glue files have erroneous entries for their hc_driver's suspend and resume methods. These method pointers are specific to PCI and shouldn't be used otherwise. (The patch also includes an invisible whitespace fix.) Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: David Brownell <dbrownell@users.sourceforge.net>
| * | USB: add all configs to the "descriptors" attributeAlan Stern2008-05-291-23/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as1094) changes the output of the "descriptors" binary attribute. Now it will contain the device descriptor followed by all the configuration descriptors, not just the descriptor for the current config. Userspace libraries want to have access to the kernel's cached descriptor information, so they can learn about device characteristics without having to wake up suspended devices. So far the only user of this attribute is the new libusb-1.0 library; thus changing its contents shouldn't cause any problems. This should be considered for 2.6.26, if for no other reason than to minimize the range of releases in which the attribute contains only the current config descriptor. Also, it doesn't hurt that the patch removes the device locking -- which was formerly needed in order to know for certain which config was indeed current. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: fix possible deadlock involving sysfs attributesAlan Stern2008-05-292-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a potential deadlock when the usb_generic driver is unbound from a device. The problem is that generic_disconnect() is called with the device lock held, and it removes a bunch of device attributes from sysfs. If a user task happens to be running an attribute method at the time, the removal will block until the method returns. But at least one of the attribute methods (the store routine for power/level) needs to acquire the device lock! This patch (as1093) eliminates the deadlock by moving the calls to create and remove the sysfs attributes from the usb_generic driver into usb_new_device() and usb_disconnect(), where they can be invoked without holding the device lock. Besides, the other sysfs attributes are created when the device is registered and removed when the device is unregistered. So it seems only fitting for the extra attributes to be created and removed at the same time. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: Firmware loader driver for USB Apple iSight cameraMatthew Garrett2008-05-293-0/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Uninitialised Apple iSight drivers present with a distinctive USB ID. Once firmware has been uploaded, they disconnect and reconnect with a new ID. At this point they can be driven by the uvcvideo driver. As this is unique to the Apple cameras and not functionality shared by any other UVC devices, it makes sense to provide the firmware loading functionality in a separate driver. This driver will read an isight.fw file extracted from the Apple driver using the tools at http://bersace03.free.fr/ift/ and upload it to the camera. It will also handle the case where the device loses its firmware during hibernation and must have it reloaded. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | USB: FTDI_SIO : Add support for Matrix Orbital PID RangeRay Molenkamp2008-05-292-6/+525
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the range of PIDs that have been allocated for FTDI based devices at Matrix Orbital. A small number of units have been shipped early 2008 with a faulty USB Descriptor. Products that may have this issue have been marked with the existing quirk to work around the problem. Signed-off-by: R. Molenkamp <rmolenkamp@matrixorbital.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | Merge branch 'fixes' of ↵Linus Torvalds2008-05-291-2/+2
|\ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] fix double unlock of cpu_policy_rwsem in drivers/cpufreq/cpufreq.c
| * | [CPUFREQ] fix double unlock of cpu_policy_rwsem in drivers/cpufreq/cpufreq.cLothar Waßmann2008-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In drivers/cpufreq/cpufreq.c the function cpufreq_add_dev() takes the error exit 'err_out_unregister' from different places once with the 'cpu_policy_rwsem' lock held, once with the lock released: | if (ret) | goto err_out_unregister; | } | | policy->governor = NULL; /* to assure that the starting sequence is | * run in cpufreq_set_policy */ | | /* set default policy */ | ret = __cpufreq_set_policy(policy, &new_policy); | policy->user_policy.policy = policy->policy; | policy->user_policy.governor = policy->governor; | | unlock_policy_rwsem_write(cpu); | | if (ret) { | dprintk("setting policy failed\n"); | goto err_out_unregister; | } This leads to the following error message in case of a failing __cpufreq_set_policy() call: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at: [<c01b4564>] unlock_policy_rwsem_write+0x30/0x40 but there are no more locks to release! other info that might help us debug this: 1 lock held by swapper/1: #0: (sysdev_drivers_lock){--..}, at: [<c018fd18>] sysdev_driver_register+0x74/0x130 stack backtrace: [<c002f588>] (dump_stack+0x0/0x14) from [<c00692fc>] (print_unlock_inbalance_bug+0xc8/0x104) [<c0069234>] (print_unlock_inbalance_bug+0x0/0x104) from [<c006b7ac>] (lock_release_non_nested+0xc4/0x19c) r6:00000028 r5:c3c1ab80 r4:c01b4564 [<c006b6e8>] (lock_release_non_nested+0x0/0x19c) from [<c006b9e0>] (lock_release+0x15c/0x18c) r8:60000013 r7:00000001 r6:c01b4564 r5:c0541bb4 r4:c3c1ab80 [<c006b884>] (lock_release+0x0/0x18c) from [<c0061ba0>] (up_write+0x24/0x30) r8:c0541b80 r7:00000000 r6:ffffffea r5:c3c34828 r4:c0541b8c [<c0061b7c>] (up_write+0x0/0x30) from [<c01b4564>] (unlock_policy_rwsem_write+0x30/0x40) r4:c3c34884 [<c01b4534>] (unlock_policy_rwsem_write+0x0/0x40) from [<c01b4c40>] (cpufreq_add_dev+0x324/0x398) [<c01b491c>] (cpufreq_add_dev+0x0/0x398) from [<c018fd64>] (sysdev_driver_register+0xc0/0x130) [<c018fca4>] (sysdev_driver_register+0x0/0x130) from [<c01b3574>] (cpufreq_register_driver+0xbc/0x174) Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de> Signed-off-by: Dave Jones <davej@redhat.com>
* | | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2008-05-298-584/+150
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: re-tune NUMA topologies sched: stop wake_affine from causing serious imbalance sched: fix sched_clock_cpu() revert ("sched: fair-group: SMP-nice for group scheduling") sched: cleanup show_schedstat(): fix memleak sched: unite unlikely pairs in rt_policy() and schedule_debug() revert ("sched: fair: weight calculations")
| * \ \ Merge commit 'linus/master' into sched-fixes-for-linusIngo Molnar2008-05-29877-7306/+12739
| |\ \ \
| * | | | sched: re-tune NUMA topologiesIngo Molnar2008-05-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | improve the sysbench ramp-up phase and its peak throughput on a 16way NUMA box, by turning on WAKE_AFFINE: tip/sched tip/sched+wake-affine ------------------------------------------------- 1: 700 830 +15.65% 2: 1465 1391 -5.28% 4: 3017 3105 +2.81% 8: 5100 6021 +15.30% 16: 10725 10745 +0.19% 32: 10135 10150 +0.16% 64: 9338 9240 -1.06% 128: 8599 8252 -4.21% 256: 8475 8144 -4.07% ------------------------------------------------- SUM: 57558 57882 +0.56% this change also improves lat_ctx from 6.69 usecs to 1.11 usec: $ ./lat_ctx -s 0 2 "size=0k ovr=1.19 2 1.11 $ ./lat_ctx -s 0 2 "size=0k ovr=1.22 2 6.69 in sysbench it's an overall win with some weakness at the lots-of-clients side. That happens because we now under-balance this workload a bit. To counter that effect, turn on NEWIDLE: wake-idle wake-idle+newidle ------------------------------------------------- 1: 830 834 +0.43% 2: 1391 1401 +0.65% 4: 3105 3091 -0.43% 8: 6021 6046 +0.42% 16: 10745 10736 -0.08% 32: 10150 10206 +0.55% 64: 9240 9533 +3.08% 128: 8252 8355 +1.24% 256: 8144 8384 +2.87% ------------------------------------------------- SUM: 57882 58591 +1.21% as a bonus this not only improves the many-clients case but also improves the (more important) rampup phase. sysbench is a workload that quickly breaks down if the scheduler over-balances, so since it showed an improvement under NEWIDLE this change is definitely good.
| * | | | sched: stop wake_affine from causing serious imbalanceMike Galbraith2008-05-291-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent short-running wakers of short-running threads from overloading a single cpu via wakeup affinity, and wire up disconnected debug option. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | sched: fix sched_clock_cpu()Peter Zijlstra2008-05-291-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sched_clock_cpu() return 0 before it has been initialized and avoid corrupting its state due to doing so. This fixes the weird printk timestamp jump reported. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
| * | | | revert ("sched: fair-group: SMP-nice for group scheduling")Ingo Molnar2008-05-295-489/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yanmin Zhang reported: Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1. It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito. With bisect, I located the following patch: | 18d95a2832c1392a2d63227a7a6d433cb9f2037e is first bad commit | commit 18d95a2832c1392a2d63227a7a6d433cb9f2037e | Author: Peter Zijlstra <a.p.zijlstra@chello.nl> | Date: Sat Apr 19 19:45:00 2008 +0200 | | sched: fair-group: SMP-nice for group scheduling Revert it so that we get v2.6.25 behavior. Bisected-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | sched: cleanupIngo Molnar2008-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | show_schedstat(): fix memleakAdrian Bunk2008-05-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Coverity checker spotted a memleak introduced by commit 39106dcf85285e78f3b290022122c76f851379b8 (cpumask: use new cpus_scnprintf function). It seems the kfree() got lost between v2 and v3 of this patch... Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | sched: unite unlikely pairs in rt_policy() and schedule_debug()Roel Kluin2008-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removes obfuscation and may improve assembly. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | revert ("sched: fair: weight calculations")Ingo Molnar2008-05-292-75/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yanmin Zhang reported: Comparing with kernel 2.6.25, sysbench+mysql(oltp, readonly) has many regressions with 2.6.26-rc1: 1) 8-core stoakley: 28%; 2) 16-core tigerton: 20%; 3) Itanium Montvale: 50%. Bisect located this patch: | 8f1bc385cfbab474db6c27b5af1e439614f3025c is first bad commit | commit 8f1bc385cfbab474db6c27b5af1e439614f3025c | Author: Peter Zijlstra <a.p.zijlstra@chello.nl> | Date: Sat Apr 19 19:45:00 2008 +0200 | | sched: fair: weight calculations Revert it to the 2.6.25 state. Bisected-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | Merge branch 'release' of ↵Linus Torvalds2008-05-288-47/+128
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Workaround for RSE issue
| * | | | [IA64] Workaround for RSE issueTony Luck2008-05-278-47/+128
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: An application violating the architectural rules regarding operation dependencies and having specific Register Stack Engine (RSE) state at the time of the violation, may result in an illegal operation fault and invalid RSE state. Such faults may initiate a cascade of repeated illegal operation faults within OS interruption handlers. The specific behavior is OS dependent. Implication: An application causing an illegal operation fault with specific RSE state may result in a series of illegal operation faults and an eventual OS stack overflow condition. Workaround: OS interruption handlers that switch to kernel backing store implement a check for invalid RSE state to avoid the series of illegal operation faults. The core of the workaround is the RSE_WORKAROUND code sequence inserted into each invocation of the SAVE_MIN_WITH_COVER and SAVE_MIN_WITH_COVER_R19 macros. This sequence includes hard-coded constants that depend on the number of stacked physical registers being 96. The rest of this patch consists of code to disable this workaround should this not be the case (with the presumption that if a future Itanium processor increases the number of registers, it would also remove the need for this patch). Move the start of the RBS up to a mod32 boundary to avoid some corner cases. The dispatch_illegal_op_fault code outgrew the spot it was squatting in when built with this patch and CONFIG_VIRT_CPU_ACCOUNTING=y Move it out to the end of the ivt. Signed-off-by: Tony Luck <tony.luck@intel.com>
* | | | Fix FRV minimum slab/kmalloc alignmentDavid Howells2008-05-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > +#define ARCH_KMALLOC_MINALIGN (sizeof(long) * 2) > +#define ARCH_SLAB_MINALIGN (sizeof(long) * 2) This doesn't work if SLAB is selected and slab debugging is enabled as these are passed to the preprocessor, and the preprocessor doesn't understand sizeof. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2008-05-287-33/+110
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cfq-iosched: fix RCU problem in cfq_cic_lookup() block: make blktrace use per-cpu buffers for message notes Added in elevator switch message to blktrace stream Added in MESSAGE notes for blktraces block: reorder cfq_queue to save space on 64bit builds block: Move the second call to get_request to the end of the loop splice: handle try_to_release_page() failure splice: fix sendfile() issue with relay
| * | | | cfq-iosched: fix RCU problem in cfq_cic_lookup()Jens Axboe2008-05-281-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfq_cic_lookup() needs to properly protect ioc->ioc_data before dereferencing it and also exclude updaters of ioc->ioc_data as well. Also add a number of comments documenting why the existing RCU usage is OK. Thanks a lot to "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> for review and comments! Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | | block: make blktrace use per-cpu buffers for message notesJens Axboe2008-05-282-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently it uses a single static char array, but that risks being corrupted when multiple users issue message notes at the same time. Make the buffers dynamically allocated when the trace is setup and make them per-cpu instead. The default max message size of 1k is also very large, the interface is mainly for small text notes. So shrink it to 128 bytes. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | | Added in elevator switch message to blktrace streamAlan D. Brunelle2008-05-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | | Added in MESSAGE notes for blktracesAlan D. Brunelle2008-05-282-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allows messages to be inserted into blktrace streams. Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | | block: reorder cfq_queue to save space on 64bit buildsRichard Kennedy2008-05-281-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | saves 8 bytes of padding & increases objects/slab from 30 to 32 on my AMD64 config Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | | block: Move the second call to get_request to the end of the loopZhang, Yanmin2008-05-281-20/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In function get_request_wait, the second call to get_request could be moved to the end of the while loop, because if the first call to get_request fails, the second call will fail without sleep. Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | | splice: handle try_to_release_page() failureJens Axboe2008-05-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | splice currently assumes that try_to_release_page() always suceeds, but it can return failure. If it does, we cannot steal the page. Acked-by: Mingming Cao <cmm@us.ibm.com Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | | splice: fix sendfile() issue with relayTom Zanussi2008-05-282-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Splice isn't always incrementing the ppos correctly, which broke relay splice. Signed-off-by: Tom Zanussi <zanussi@comcast.net> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | | | | FRV: Specify the minimum slab/kmalloc alignmentDavid Howells2008-05-281-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Specify the minimum slab/kmalloc alignment to be 8 bytes. This fixes a crash when SLOB is selected as the memory allocator. The FRV arch needs this so that it can use the load- and store-double instructions without faulting. By default SLOB sets the minimum to be 4 bytes. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | MN10300: Fix typo in header guardVegard Nossum2008-05-281-1/+1
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a typo in the header guard of asm/ipc.h. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2008-05-277-54/+159
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: pciehp: add message about pciehp_slot_with_bus option pci hotplug core: add check of duplicate slot name pciehp: move msleep after power off pciehp: poll cmd completion if hotplug interrupt is disabled pciehp: fix slow probing pciehp: fix NULL dereference in interrupt handler shpchp: add message about shpchp_slot_with_bus option PCI: don't enable ASPM on devices with mixed PCIe/PCI functions