summaryrefslogtreecommitdiffstats
path: root/drivers/idle
Commit message (Collapse)AuthorAgeFilesLines
* ACPI processor hotplug: Delay acpi_processor_start() call for hotplugged coresThomas Renninger2012-01-191-1/+1
| | | | | | | | | | Delay the setting up of features (cpuidle, throttling by calling acpi_processor_start()) to the time when the hotplugged core got onlined the first time and got fully initialized. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
*---. Merge branches 'einj', 'intel_idle', 'misc', 'srat' and 'turbostat-ivb' into ↵Len Brown2012-01-181-47/+49
|\ \ \ | | | | | | | | | | | | release
| | * | intel_idle: Split up and provide per CPU initialization funcThomas Renninger2012-01-171-40/+42
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function split up, should have no functional change. Provides entry point for physically hotplugged CPUs to initialize and activate cpuidle. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> CC: Shaohua Li <shaohua.li@intel.com> CC: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel idle: Make idle driver more robustThomas Renninger2012-01-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm -cpu host passes the original cpuid info to the guest. Latest kvm version seem to return true for mwait_leaf cpuid function on recent Intel CPUs. But it does not return mwait C-states (mwait_substates), instead zero is returned. While real CPUs seem to always return non-zero values, the intel idle driver should not get active in kvm (mwait_substates == 0) case and bail out. Otherwise a Null pointer exception will happen later when the cpuidle subsystem tries to get active: [0.984807] BUG: unable to handle kernel NULL pointer dereference at (null) [0.984807] IP: [<(null)>] (null) ... [0.984807][<ffffffff8143cf34>] ? cpuidle_idle_call+0xb4/0x340 [0.984807][<ffffffff8159e7bc>] ? __atomic_notifier_call_chain+0x4c/0x70 [0.984807][<ffffffff81001198>] ? cpu_idle+0x78/0xd0 Reference: https://bugzilla.novell.com/show_bug.cgi?id=726296 Cc: stable@vger.kernel.org Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Bruno Friedmann <bruno@ioda-net.ch> Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: Fix a cast to pointer from integer of different size warning in ↵David Howells2012-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intel_idle Fix the following warning: drivers/idle/intel_idle.c: In function 'intel_idle_cpuidle_devices_init': drivers/idle/intel_idle.c:518:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] By making get_driver_data() return a long instead of an int. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: remove redundant local_irq_disable() callYanmin Zhang2012-01-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irq disabling happens earlier in process_32.c:cpu_idle. Basically, cpuidle_state->enter is called, cpu irq is disabled. cpuidle_state->enter would turn on irq when exiting. intel_idle doesn't follow this assumption. Although it doesn't cause real issue, it misleads developers. Remove the call to local_irq_disable() at entry. [akpm@linux-foundation.org: add comment] Signed-off-by: Mingming Zhang <mingmingx.zhang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: fix API misuseShaohua Li2012-01-171-3/+3
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | smp_call_function() only lets all other CPUs execute a specific function, while we expect all CPUs do in intel_idle. Without the fix, we could have one cpu which has auto_demotion enabled or has no broadcast timer setup. Usually we don't see impact because auto demotion just harms power and the intel_idle init is called in CPU 0, where boradcast timer delivers interrupt, but this still could be a problem. Cc: stable@vger.kernel.org Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
* | Merge branch 'release' of ↵Linus Torvalds2011-11-071-31/+99
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: cpuidle: Single/Global registration of idle states cpuidle: Split cpuidle_state structure and move per-cpu statistics fields cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare() cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning ACPI: Export FADT pm_profile integer value to userspace thermal: Prevent polling from happening during system suspend ACPI: Drop ACPI_NO_HARDWARE_INIT ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast() PNPACPI: Simplify disabled resource registration ACPI: Fix possible recursive locking in hwregs.c ACPI: use kstrdup() mrst pmu: update comment tools/power turbostat: less verbose debugging
| * cpuidle: Single/Global registration of idle statesDeepthi Dharwar2011-11-061-19/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the cpuidle_states structure global (single copy) instead of per-cpu. The statistics needed on per-cpu basis by the governor are kept per-cpu. This simplifies the cpuidle subsystem as state registration is done by single cpu only. Having single copy of cpuidle_states saves memory. Rare case of asymmetric C-states can be handled within the cpuidle driver and architectures such as POWER do not have asymmetric C-states. Having single/global registration of all the idle states, dynamic C-state transitions on x86 are handled by the boot cpu. Here, the boot cpu would disable all the devices, re-populate the states and later enable all the devices, irrespective of the cpu that would receive the notification first. Reference: https://lkml.org/lkml/2011/4/25/83 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * cpuidle: Split cpuidle_state structure and move per-cpu statistics fieldsDeepthi Dharwar2011-11-061-12/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first step towards global registration of cpuidle states. The statistics used primarily by the governor are per-cpu and have to be split from rest of the fields inside cpuidle_state, which would be made global i.e. single copy. The driver_data field is also per-cpu and moved. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * cpuidle: Move dev->last_residency update to driver enter routine; remove ↵Deepthi Dharwar2011-11-061-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev->last_state Cpuidle governor only suggests the state to enter using the governor->select() interface, but allows the low level driver to override the recommended state. The actual entered state may be different because of software or hardware demotion. Software demotion is done by the back-end cpuidle driver and can be accounted correctly. Current cpuidle code uses last_state field to capture the actual state entered and based on that updates the statistics for the state entered. Ideally the driver enter routine should update the counters, and it should return the state actually entered rather than the time spent there. The generic cpuidle code should simply handle where the counters live in the sysfs namespace, not updating the counters. Reference: https://lkml.org/lkml/2011/3/25/52 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>
* | x86: fix up files really needing to include module.hPaul Gortmaker2011-10-311-0/+1
|/ | | | | | | These files aren't just exporting symbols -- they are also defining a MODULE_LICENSE etc. so give them the full module.h file. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* intel_idle: Rename cpuidle statesThomas Renninger2011-02-281-11/+11
| | | | | | | | | | | Userspace apps might have to cut off parts off the idle state name for display reasons. Switch NHM-C1 to C1-NHM (and others) so that a cut off name is unique and makes sense to the user. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: lenb@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* intel_idle: disable Atom/Lincroft HW C-state auto-demotionLen Brown2011-02-171-0/+4
| | | | | | | | | | | | Just as we had to disable auto-demotion for NHM/WSM, we need to do the same for Atom (Lincroft version). In particular, auto-demotion will prevent Lincroft from entering the S0i3 idle power saving state. https://bugzilla.kernel.org/show_bug.cgi?id=25252 Signed-off-by: Len Brown <len.brown@intel.com>
* intel_idle: disable NHM/WSM HW C-state auto-demotionLen Brown2011-02-171-0/+20
| | | | | | | | | | | | | | Hardware C-state auto-demotion is a mechanism where the HW overrides the OS C-state request, instead demoting to a shallower state, which is less expensive, but saves less power. Modern Linux should generally get exactly the states it requests. In particular, when a CPU is taken off-line, it must not be demoted, else it can prevent the entire package from reaching deep C-states. https://bugzilla.kernel.org/show_bug.cgi?id=25252 Signed-off-by: Len Brown <len.brown@intel.com>
* fix a shutdown regression in intel_idleShaohua Li2011-01-251-6/+2
| | | | | | | | | | | | | | | | Fix a shutdown regression caused by 2a2d31c8dc6f ("intel_idle: open broadcast clock event"). The clockevent framework can automatically shutdown broadcast timers for hotremove CPUs. And we get a shutdown regression when we shutdown broadcast timer for hot remove CPU, so just delete some code. Also fix some section mismatch. Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'cpuidle-perf-events' into idle-testLen Brown2011-01-121-2/+0
|\
| * cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle ↵Thomas Renninger2011-01-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | events from the cpuidle layer Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle" events -> this patch fixes it and makes cpu_idle events throwing less complex. It also introduces cpu_idle events for all architectures which use the cpuidle subsystem, namely: - arch/arm/mach-at91/cpuidle.c - arch/arm/mach-davinci/cpuidle.c - arch/arm/mach-kirkwood/cpuidle.c - arch/arm/mach-omap2/cpuidle34xx.c - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait) - arch/x86/kernel/process.c (did throw events before, but was a mess) - drivers/idle/intel_idle.c (did throw events before) Convention should be: Fire cpu_idle events inside the current pm_idle function (not somewhere down the the callee tree) to keep things easy. Current possible pm_idle functions in X86: c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle -> this is really easy is now. This affects userspace: The type field of the cpu_idle power event can now direclty get mapped to: /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...} instead of throwing very CPU/mwait specific values. This change is not visible for the intel_idle driver. For the acpi_idle driver it should only be visible if the vendor misses out C-states in his BIOS. Another (perf timechart) patch reads out cpuidle info of cpu_idle events from: /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped to the correct C-/cpuidle state again, even if e.g. vendors miss out C-states in their BIOS and for example only export C1 and C3. -> everything is fine. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Robert Schoene <robert.schoene@tu-dresden.de> CC: Jean Pihet <j-pihet@ti.com> CC: Arjan van de Ven <arjan@linux.intel.com> CC: Ingo Molnar <mingo@elte.hu> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: linux-pm@lists.linux-foundation.org CC: linux-acpi@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-perf-users@vger.kernel.org CC: linux-omap@vger.kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* | Merge branch 'linus' into idle-testLen Brown2011-01-121-2/+1
|\|
| * perf: Clean up power events by introducing new, more generic onesThomas Renninger2011-01-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
| * perf: Do not export power_frequency, but power_start eventThomas Renninger2011-01-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | power_frequency moved to drivers/cpufreq/cpufreq.c which has to be compiled in, no need to export it. intel_idle can a be module though... Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Jean Pihet <j-pihet@ti.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-2-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
* | intel_idle: open broadcast clock eventShaohua Li2011-01-121-1/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | Intel_idle driver uses CLOCK_EVT_NOTIFY_BROADCAST_ENTER CLOCK_EVT_NOTIFY_BROADCAST_EXIT for broadcast clock events. The _ENTER/_EXIT doesn't really open broadcast clock events, please see processor_idle.c for an example. In some situation, this will cause boot hang, because some CPUs enters idle but local APIC timer stalls. Reported-and-tested-by: Yan Zheng <zheng.z.yan@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> cc: stable@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* | cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idleLen Brown2011-01-121-0/+8
| | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
* | ACPI, intel_idle: Cleanup idle= internal variablesThomas Renninger2011-01-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having four variables for the same thing: idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides is rather confusing and unnecessary complex. if idle= boot param is passed, only set up one variable: boot_option_idle_overrides Introduces following functional changes/fixes: - intel_idle driver does not register if any idle=xy boot param is passed. - processor_idle.c will also not register a cpuidle driver and get active if idle=halt is passed. Before a cpuidle driver with one (C1, halt) state got registered Now the default_idle function will be used which finally uses the same idle call to enter sleep state (safe_halt()), but without registering a whole cpuidle driver. That means idle= param will always avoid cpuidle drivers to register with one exception (same behavior as before): idle=nomwait may still register acpi_idle cpuidle driver, but C1 will not use mwait, but hlt. This can be a workaround for IO based deeper sleep states where C1 mwait causes problems. Signed-off-by: Thomas Renninger <trenn@suse.de> cc: x86@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
* | intel_idle: update Sandy Bridge core C-state residency targetsLen Brown2011-01-121-4/+4
|/ | | | Signed-off-by: Len Brown <len.brown@intel.com>
* intel_idle: recognize ARAT on WSM-EXLen Brown2010-12-021-9/+3
| | | | | | | We erroneously ignored the Always Running APIC Timer on WSM-EX. Move the check for ARAT down so that it can apply to any/all models. Signed-off-by: Len Brown <len.brown@intel.com>
* Merge branch 'idle-release' of ↵Linus Torvalds2010-10-261-15/+46
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: intel_idle: do not use the LAPIC timer for ATOM C2 intel_idle: add initial Sandy Bridge support acpi_idle: delete bogus data from cpuidle_state.power_usage intel_idle: delete bogus data from cpuidle_state.power_usage intel_idle: simplify test for leave_mm()
| * intel_idle: do not use the LAPIC timer for ATOM C2Len Brown2010-10-261-1/+1
| | | | | | | | | | | | | | | | | | If we use the LAPIC timer during ATOM C2 on some nvidia chisets, the system stalls. https://bugzilla.kernel.org/show_bug.cgi?id=21032 Signed-off-by: Len Brown <len.brown@intel.com>
| * Merge branch 'intel_idle+snb' into idle-releaseLen Brown2010-10-231-1/+42
| |\ | | | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
| | * intel_idle: add initial Sandy Bridge supportLen Brown2010-10-231-1/+43
| | | | | | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: delete bogus data from cpuidle_state.power_usageLen Brown2010-10-151-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | The mW data in this field is a total fabrication and serves no purpose other than to mislead those who might see it in sysfs. Delete it. Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: simplify test for leave_mm()Len Brown2010-10-151-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | A run-time test to invoke leave_mm() for the deepest supported C-state is redundant, since the appropriate C-states already have flags with CPUIDLE_FLAG_TLB_FLUSHED set. Signed-off-by: Len Brown <len.brown@intel.com>
* | | Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bklLinus Torvalds2010-10-221-0/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek
| * | | llseek: automatically add .llseek fopArnd Bergmann2010-10-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
* | | | Merge branch 'x86-idle-for-linus' of ↵Linus Torvalds2010-10-211-8/+1
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line x86, hotplug: Move WBINVD back outside the play_dead loop x86, hotplug: Use mwait to offline a processor, fix the legacy case x86, mwait: Move mwait constants to a common header file
| * | | x86, mwait: Move mwait constants to a common header fileH. Peter Anvin2010-09-171-8/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | We have MWAIT constants spread across three different .c files, for no good reason. Move them all into a common header file. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <tip-*@git.kernel.org>
* | | intel_idle: enable Atom C6Len Brown2010-10-081-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | ATM-C6 was commented out, pending public documentation. https://bugzilla.kernel.org/show_bug.cgi?id=19762 Tested-by: Dennis Jansen <Dennis.Jansen@...> Signed-off-by: Len Brown <len.brown@intel.com>
* | | intel_idle: Voluntary leave_mm before entering deeperSuresh Siddha2010-09-301-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid TLB flush IPIs for the cores in deeper c-states by voluntary leave_mm() before entering into that state. CPUs tend to flush TLB in those c-states anyways. acpi_idle does this with C3-type states, but it was not caried over when intel_idle was introduced. intel_idle can apply it to C-states in addition to those that ACPI might export as C3... Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
* | | intel_idle: add missing __percpu markupNamhyung Kim2010-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | intel_idle_cpuidle_devices is a percpu pointer but was missing __percpu markup. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Len Brown <len.brown@intel.com>
* | | intel_idle: Change mode 755 => 644Thomas Weber2010-09-281-0/+0
|/ / | | | | | | | | | | | | Remove execution permission from source file. Signed-off-by: Thomas Weber <weber@corscience.de> Signed-off-by: Len Brown <len.brown@intel.com>
* | Merge branch 'idle-release' of ↵Linus Torvalds2010-08-152-56/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: intel_idle: recognize Lincroft Atom Processor intel_idle: no longer EXPERIMENTAL intel_idle: disable module support intel_idle: add support for Westmere-EX intel_idle: delete power_policy modparam, and choose substate functions intel_idle: delete substates DEBUG modparam
| * | intel_idle: recognize Lincroft Atom ProcessorArjan van de Ven2010-08-141-0/+1
| | | | | | | | | | | | | | | Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: no longer EXPERIMENTALLen Brown2010-08-141-1/+0
| | | | | | | | | | | | | | | | | | This is a fully supported driver. Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: disable module supportLen Brown2010-08-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now the module capability is cauing more trouble than it is worth. At least one distro built intel_idle as a module where it lost the init race with ACPI, making it useless. Make intel_idle bool so that if you select it, you will use it. We can restore module capability after cpuidle is enhanced to handle run-time changing of idle drivers. Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: add support for Westmere-EXLen Brown2010-07-261-0/+1
| | | | | | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: delete power_policy modparam, and choose substate functionsLen Brown2010-07-231-43/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea behind power policy was that it would start off as a modparam, and then hook into the new "global" in-kernel power vs energy tunable. But that tunable isn't happening, so delete the hook here. With the policy hook gone, the sub-state choice functions do not do anything useful, so delete them from the critical path. To handle sub-states in the future, we will advertise them with dedicated cpuidle_state entries. That is necessary because some of the sub-states will have substantially different properties than their peer sub-states. Signed-off-by: Len Brown <len.brown@intel.com>
| * | intel_idle: delete substates DEBUG modparamLen Brown2010-07-231-13/+7
| |/ | | | | | | | | | | it isn't useful anymore Signed-off-by: Len Brown <len.brown@intel.com>
* / Merge branch 'next' of ↵Linus Torvalds2010-08-041-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Remove pointless printk from p4-clockmod. [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c [CPUFREQ] Fix section mismatch for longhaul_cpu_init. [CPUFREQ] Fix section mismatch for longrun_cpu_init. [CPUFREQ] powernow-k8: Fix misleading variable naming [CPUFREQ] Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used) [CPUFREQ] arch/x86/kernel/cpu/cpufreq: use for_each_pci_dev() [CPUFREQ] fix brace coding style issue. [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent [CPUFREQ] acpi-cpufreq: Fix CPU_ANY CPUFREQ_{PRE,POST}CHANGE notification [CPUFREQ] ondemand: don't synchronize sample rate unless multiple cpus present [CPUFREQ] unexport (un)lock_policy_rwsem* functions [CPUFREQ] ondemand: Refactor frequency increase code [CPUFREQ] powernow-k8: On load failure, remind the user to enable support in BIOS setup [CPUFREQ] powernow-k8: Limit Pstate transition latency check [CPUFREQ] Fix PCC driver error path [CPUFREQ] fix double freeing in error path of pcc-cpufreq [CPUFREQ] pcc driver should check for pcch method before calling _OSC [CPUFREQ] fix memory leak in cpufreq_add_dev [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)" Manually fix up non-data merge conflict introduced by new calling conventions for trace_power_start() in commit 6f4f2723d085 ("x86 cpufreq: Make trace_power_frequency cpufreq driver independent"), which didn't update the intel_idle native hardware cpuidle driver.
* intel_idle: native hardware cpuidle driver for latest Intel processorsLen Brown2010-05-283-0/+473
| | | | | | | | | | | | | | | | | | | | | | This EXPERIMENTAL driver supersedes acpi_idle on Intel Atom Processors, Intel Core i3/i5/i7 Processors and associated Intel Xeon processors. It does not support the Intel Core2 processor or earlier. For kernels configured with ACPI, CONFIG_INTEL_IDLE=y allows intel_idle to probe before the ACPI processor driver. Booting with "intel_idle.max_cstate=0" disables intel_idle and the system will fall back on ACPI's "acpi_idle". Typical Linux distributions load ACPI processor module early, making CONFIG_INTEL_IDLE=m not easily useful on ACPI platforms. intel_idle probes all processors at module_init time. Processors that are hot-added later will be limited to using C1 in idle. Signed-off-by: Len Brown <len.brown@intel.com>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>