summaryrefslogtreecommitdiffstats
path: root/kernel/power/main.c
Commit message (Collapse)AuthorAgeFilesLines
* freezer,sched: Rewrite core freezer logicPeter Zijlstra2022-09-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the core freezer to behave better wrt thawing and be simpler in general. By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible. As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!). Specifically; the current scheme works a little like: freezer_do_not_count(); schedule(); freezer_count(); And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues. However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time. That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back. This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run. As said; replace this with a special task state TASK_FROZEN and add the following state transitions: TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost). The special __TASK_{STOPPED,TRACED} states *can* be restored since their canonical state is in ->jobctl. With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
* freezer: Have {,un}lock_system_sleep() save/restore flagsPeter Zijlstra2022-09-071-6/+10
| | | | | | | | | | | | | Rafael explained that the reason for having both PF_NOFREEZE and PF_FREEZER_SKIP is that {,un}lock_system_sleep() is callable from kthread context that has previously called set_freezable(). In preparation of merging the flags, have {,un}lock_system_slee() save and restore current->flags. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114648.725003428@infradead.org
* Merge tag 'cxl-for-5.19' of ↵Linus Torvalds2022-05-271-1/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for this cycle. The highlight is new driver-core infrastructure and CXL subsystem changes for allowing lockdep to validate device_lock() usage. Thanks to PeterZ for setting me straight on the current capabilities of the lockdep API, and Greg acked it as well. On the CXL ACPI side this update adds support for CXL _OSC so that platform firmware knows that it is safe to still grant Linux native control of PCIe hotplug and error handling in the presence of CXL devices. A circular dependency problem was discovered between suspend and CXL memory for cases where the suspend image might be stored in CXL memory where that image also contains the PCI register state to restore to re-enable the device. Disable suspend for now until an architecture is defined to clarify that conflict. Lastly a collection of reworks, fixes, and cleanups to the CXL subsystem where support for snooping mailbox commands and properly handling the "mem_enable" flow are the highlights. Summary: - Add driver-core infrastructure for lockdep validation of device_lock(), and fixup a deadlock report that was previously hidden behind the 'lockdep no validate' policy. - Add CXL _OSC support for claiming native control of CXL hotplug and error handling. - Disable suspend in the presence of CXL memory unless and until a protocol is identified for restoring PCI device context from memory hosted on CXL PCI devices. - Add support for snooping CXL mailbox commands to protect against inopportune changes, like set-partition with the 'immediate' flag set. - Rework how the driver detects legacy CXL 1.1 configurations (CXL DVSEC / 'mem_enable') before enabling new CXL 2.0 decode configurations (CXL HDM Capability). - Miscellaneous cleanups and fixes from -next exposure" * tag 'cxl-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (47 commits) cxl/port: Enable HDM Capability after validating DVSEC Ranges cxl/port: Reuse 'struct cxl_hdm' context for hdm init cxl/port: Move endpoint HDM Decoder Capability init to port driver cxl/pci: Drop @info argument to cxl_hdm_decode_init() cxl/mem: Merge cxl_dvsec_ranges() and cxl_hdm_decode_init() cxl/mem: Skip range enumeration if mem_enable clear cxl/mem: Consolidate CXL DVSEC Range enumeration in the core cxl/pci: Move cxl_await_media_ready() to the core cxl/mem: Validate port connectivity before dvsec ranges cxl/mem: Fix cxl_mem_probe() error exit cxl/pci: Drop wait_for_valid() from cxl_await_media_ready() cxl/pci: Consolidate wait_for_media() and wait_for_media_ready() cxl/mem: Drop mem_enabled check from wait_for_media() nvdimm: Fix firmware activation deadlock scenarios device-core: Kill the lockdep_mutex nvdimm: Drop nd_device_lock() ACPI: NFIT: Drop nfit_device_lock() nvdimm: Replace lockdep_mutex with local lock classes cxl: Drop cxl_device_lock() cxl/acpi: Add root device lockdep validation ...
| * PM: CXL: Disable suspendDan Williams2022-04-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CXL specification claims S3 support at a hardware level, but at a system software level there are some missing pieces. Section 9.4 (CXL 2.0) rightly claims that "CXL mem adapters may need aux power to retain memory context across S3", but there is no enumeration mechanism for the OS to determine if a given adapter has that support. Moreover the save state and resume image for the system may inadvertantly end up in a CXL device that needs to be restored before the save state is recoverable. I.e. a circular dependency that is not resolvable without a third party save-area. Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly allows for suspend, but requires unbinding all CXL memory devices before the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind flow is intended to also tear down all CXL memory regions associated with a given cxl_memdev. It is reasonable to assume that any device participating in a System RAM range published in the EFI memory map is covered by aux power and save-area outside the device itself. So this restriction can be minimized in the future once pre-existing region enumeration support arrives, and perhaps a spec update to clarify if the EFI memory map is sufficent for determining the range of devices managed by platform-firmware for S3 support. Per Rafael, if the CXL configuration prevents suspend then it should fail early before tasks are frozen, and mem_sleep should stop showing 'mem' as an option [1]. Effectively CXL augments the platform suspend ->valid() op since, for example, the ACPI ops are not aware of the CXL / PCI dependencies. Given the split role of platform firmware vs OS provisioned CXL memory it is up to the cxl_mem driver to determine if the CXL configuration has elements that platform firmware may not be prepared to restore. Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1] Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/165066828317.3907920.5690432272182042556.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | PM: sleep: enable dynamic debug support within pm_pr_dbg()David Cohen2022-04-131-29/+0
|/ | | | | | | | | | | | | | | | | | | Currently pm_pr_dbg() is used to filter kernel pm debug messages based on pm_debug_messages_on flag. The problem is if we enable/disable this flag it will affect all pm_pr_dbg() calls at once, so we can't individually control them. This patch changes pm_pr_dbg() implementation as such: - If pm_debug_messages_on is enabled, print the message. - If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is enabled, only print the messages explicitly enabled on /sys/kernel/debug/dynamic_debug/control. - If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is disabled, don't print the message. Signed-off-by: David Cohen <dacohen@pm.me> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM: s2idle: ACPI: Fix wakeup interrupts handlingRafael J. Wysocki2022-02-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit e3728b50cd9b ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE") wakeup interrupts occurring immediately after the one discarded by acpi_s2idle_wake() may be missed. Moreover, if the SCI triggers again immediately after the rearming in acpi_s2idle_wake(), that wakeup may be missed too. The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup() when pm_wakeup_irq is 0, but that's not the case any more after the interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is cleared by the pm_wakeup_clear() call in s2idle_loop(). However, there may be wakeup interrupts occurring in that time frame and if that happens, they will be missed. To address that issue first move the clearing of pm_wakeup_irq to the point at which it is known that the interrupt causing acpi_s2idle_wake() to tun will be discarded, before rearming the SCI for wakeup. Moreover, because that only reduces the size of the time window in which the issue may manifest itself, allow pm_system_irq_wakeup() to register two second wakeup interrupts in a row and, when discarding the first one, replace it with the second one. [Of course, this assumes that only one wakeup interrupt can be discarded in one go, but currently that is the case and I am not aware of any plans to change that.] Fixes: e3728b50cd9b ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM: sleep: unmark 'state' functions as kernel-docRandy Dunlap2021-08-161-1/+1
| | | | | | | | | | | | | | Fix kernel-doc warnings in kernel/power/main.c by unmarking the comment block as kernel-doc notation. This eliminates the following kernel-doc warnings: kernel/power/main.c:593: warning: expecting prototype for state(). Prototype was for state_show() instead kernel/power/main.c:593: warning: Function parameter or member 'kobj' not described in 'state_show' kernel/power/main.c:593: warning: Function parameter or member 'attr' not described in 'state_show' kernel/power/main.c:593: warning: Function parameter or member 'buf' not described in 'state_show' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM: sleep: Constify static struct attribute_groupRikard Falkeborn2021-02-121-1/+1
| | | | | | | | | | The only usage of suspend_attr_group is to put its address in an array of pointers to const attribute_group structs. Make it const to allow the compiler to put it into read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* notifier: Fix broken error handling patternPeter Zijlstra2020-09-011-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current notifiers have the following error handling pattern all over the place: int err, nr; err = __foo_notifier_call_chain(&chain, val_up, v, -1, &nr); if (err & NOTIFIER_STOP_MASK) __foo_notifier_call_chain(&chain, val_down, v, nr-1, NULL) And aside from the endless repetition thereof, it is broken. Consider blocking notifiers; both calls take and drop the rwsem, this means that the notifier list can change in between the two calls, making @nr meaningless. Fix this by replacing all the __foo_notifier_call_chain() functions with foo_notifier_call_chain_robust() that embeds the above pattern, but ensures it is inside a single lock region. Note: I switched atomic_notifier_call_chain_robust() to use the spinlock, since RCU cannot provide the guarantee required for the recovery. Note: software_resume() error handling was broken afaict. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20200818135804.325626653@infradead.org
* PM: sleep: Add pm_debug_messages kernel command line optionChen Yu2020-04-021-0/+7
| | | | | | | | | | | | | | | | | Debug messages from the system suspend/hibernation infrastructure are disabled by default, and can only be enabled after the system has boot up via /sys/power/pm_debug_messages. This makes the hibernation resume hard to track as it involves system boot up across hibernation. There's no chance for software_resume() to track the resume process, for example. Add a kernel command line option to set pm_debug_messages during boot up. Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM: suspend: Add sysfs attribute to control the "sync on suspend" behaviorJonas Meurer2020-01-161-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | The sysfs attribute `/sys/power/sync_on_suspend` controls, whether or not filesystems are synced by the kernel before system suspend. Congruously, the behaviour of build-time switch CONFIG_SUSPEND_SKIP_SYNC is slightly changed: It now defines the run-tim default for the new sysfs attribute `/sys/power/sync_on_suspend`. The run-time attribute is added because the existing corresponding build-time Kconfig flag for (`CONFIG_SUSPEND_SKIP_SYNC`) is not flexible enough. E.g. Linux distributions that provide pre-compiled kernels usually want to stick with the default (sync filesystems before suspend) but under special conditions this needs to be changed. One example for such a special condition is user-space handling of suspending block devices (e.g. using `cryptsetup luksSuspend` or `dmsetup suspend`) before system suspend. The Kernel trying to sync filesystems after the underlying block device already got suspended obviously leads to dead-locks. Be aware that you have to take care of the filesystem sync yourself before suspending the system in those scenarios. Signed-off-by: Jonas Meurer <jonas@freesources.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM: sleep: include <linux/pm_runtime.h> for pm_wqBen Dooks2019-10-101-0/+1
| | | | | | | | | | Include the <linux/runtime_pm.h> for the definition of pm_wq to avoid the following warning: kernel/power/main.c:890:25: warning: symbol 'pm_wq' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM: sleep: Replace strncmp() with str_has_prefix()Chuhong Yuan2019-08-161-1/+1
| | | | | | | | | Use str_has_prefix() instead of strncmp() which is less straightforward in decode_state(). Signed-off-by: Chuhong Yuan <hslester96@gmail.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM/sleep: Expose suspend stats in sysfsKalesh Singh2019-08-051-2/+95
| | | | | | | | | | Userspace can get suspend stats from the suspend stats debugfs node. Since debugfs doesn't have stable ABI, expose suspend stats in sysfs under /sys/power/suspend_stats. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428Thomas Gleixner2019-06-051-3/+1
| | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this file is released under the gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 68 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* PM / sleep: Measure the time of filesystems syncingHarry Pan2019-04-021-2/+7
| | | | | | | | | | | | Measure the filesystems sync time during system sleep more precisely. Among other things, this allows the pr_cont() to be dropped from ksys_sync_helper() and makes automatic system suspend and hibernation profiling somewhat more straightforward. Signed-off-by: Harry Pan <harry.pan@intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Refactor filesystems sync to reduce duplicationHarry Pan2019-04-021-0/+9
| | | | | | | | | | Create a common helper to sync filesystems for system suspend and hibernation. Signed-off-by: Harry Pan <harry.pan@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li2018-12-121-13/+2
| | | | | | | | Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / reboot: Eliminate race between reboot and suspendPingfan Liu2018-08-061-6/+6
| | | | | | | | | | | | | At present, "systemctl suspend" and "shutdown" can run in parrallel. A system can suspend after devices_shutdown(), and resume. Then the shutdown task goes on to power off. This causes many devices are not really shut off. Hence replacing reboot_mutex with system_transition_mutex (renamed from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as system_transition_mutex can be better to reflect the purpose of the mutex. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* fix a series of Documentation/ broken file name referencesMauro Carvalho Chehab2018-06-151-2/+3
| | | | | | | | | As files move around, their previous links break. Fix the references for them. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
* PM / sleep: Make lock/unlock_system_sleep() available to kernel modulesBart Van Assche2018-01-101-0/+29
| | | | | | | | | | Since pm_mutex is not exported using lock/unlock_system_sleep() from inside a kernel module causes a "pm_mutex undefined" linker error. Hence move lock/unlock_system_sleep() into kernel/power/main.c and export these. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* ACPI / PM: Check low power idle constraints for debug onlySrinivas Pandruvada2017-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For SoC to achieve its lowest power platform idle state a set of hardware preconditions must be met. These preconditions or constraints can be obtained by issuing a device specific method (_DSM) with function "1". Refer to the document provided in the link below. Here during initialization (from attach() callback of LPS0 device), invoke function 1 to get the device constraints. Each enabled constraint is stored in a table. The devices in this table are used to check whether they were in required minimum state, while entering suspend. This check is done from platform freeze wake() callback, only when /sys/power/pm_debug_messages attribute is non zero. If any constraint is not met and device is ACPI power managed then it prints the device information to kernel logs. Also if debug is enabled in acpi/sleep.c, the constraint table and state of each device on wake is dumped in kernel logs. Since pm_debug_messages_on setting is used as condition to check constraints outside kernel/power/main.c, pm_debug_messages_on is changed to a global variable. Link: http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUGRafael J. Wysocki2017-07-241-5/+3
| | | | | | | | The pm_test sysfs attribute is under CONFIG_PM_DEBUG, but it doesn't make sense to provide it if CONFIG_PM_SLEEP is unset, so put it under CONFIG_PM_SLEEP_DEBUG instead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / timekeeping: Print debug messages when requestedRafael J. Wysocki2017-07-231-3/+7
| | | | | | | | | | | | | The messages printed by tk_debug_account_sleep_time() are basically useful for system sleep debugging, so print them only when the other debug messages from the core suspend/hibernate code are enabled. While at it, make it clear that the messages from tk_debug_account_sleep_time() are about timekeeping suspend duration, because in general timekeeping may be suspeded and resumed for multiple times during one system suspend-resume cycle. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Do not print debug messages by defaultRafael J. Wysocki2017-07-221-0/+52
| | | | | | | | | | | | | | Debug messages from the system suspend/hibernation infrastructure can fill up the entire kernel log buffer in some cases and anyway they are only useful for debugging. They depend on CONFIG_PM_DEBUG, but that is set as a rule as some generally useful diagnostic facilities depend on it too. For this reason, avoid printing those messages by default, but make it possible to turn them on as needed with the help of a new sysfs attribute under /sys/power/. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: constify attribute_group structuresArvind Yadav2017-07-041-1/+1
| | | | | | | | | | | | | | | | | | attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const. File size before: text data bss dec hex filename 3802 624 32 4458 116a kernel/power/main.o File size After adding 'const': text data bss dec hex filename 3866 560 32 4458 116a kernel/power/main.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: System sleep state selection interface reworkRafael J. Wysocki2016-11-211-3/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are systems in which the platform doesn't support any special sleep states, so suspend-to-idle (PM_SUSPEND_FREEZE) is the only available system sleep state. However, some user space frameworks only use the "mem" and (sometimes) "standby" sleep state labels, so the users of those systems need to modify user space in order to be able to use system suspend at all and that may be a pain in practice. Commit 0399d4db3edf (PM / sleep: Introduce command line argument for sleep state enumeration) attempted to address this problem by adding a command line argument to change the meaning of the "mem" string in /sys/power/state to make it trigger suspend-to-idle (instead of suspend-to-RAM). However, there also are systems in which the platform does support special sleep states, but suspend-to-idle is the preferred one anyway (it even may save more energy than the platform-provided sleep states in some cases) and the above commit doesn't help in those cases. For this reason, rework the system sleep state selection interface again (but preserve backwards compatibiliby). Namely, add a new sysfs file, /sys/power/mem_sleep, that will control the system suspend mode triggered by writing "mem" to /sys/power/state (in analogy with what /sys/power/disk does for hibernation). Make it select suspend-to-RAM ("deep" sleep) by default (if supported) and fall back to suspend-to-idle ("s2idle") otherwise and add a new command line argument, mem_sleep_default, allowing that default to be overridden if need be. At the same time, drop the relative_sleep_states command line argument that doesn't make sense any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mario Limonciello <mario.limonciello@dell.com>
* PM / sleep: enable suspend-to-idle even without registered suspend_opsSudeep Holla2016-09-131-0/+1
| | | | | | | | | | | | | | | | | | | | Suspend-to-idle (aka the "freeze" sleep state) is a system sleep state in which all of the processors enter deepest possible idle state and wait for interrupts right after suspending all the devices. There is no hard requirement for a platform to support and register platform specific suspend_ops to enter suspend-to-idle/freeze state. Only deeper system sleep states like PM_SUSPEND_STANDBY and PM_SUSPEND_MEM rely on such low level support/implementation. suspend-to-idle can be entered as along as all the devices can be suspended. This patch enables the support for suspend-to-idle even on systems that don't have any low level support for deeper system sleep states and/or don't register any platform specific suspend_ops. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: make PM notifiers called symmetricallyLianwei Wang2016-06-281-2/+9
| | | | | | | | | | This makes pm notifier PREPARE/POST symmetrical: if PREPARE fails, we will only undo what ever happened on PREPARE. It fixes the unbalanced CPU hotplug enable in CPU PM notifier. Signed-off-by: Lianwei Wang <lianwei.wang@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Add support for read-only sysfs attributesRafael J. Wysocki2016-01-041-15/+2
| | | | | | | | | Some sysfs attributes in /sys/power/ should really be read-only, so add support for that, convert those attributes to read-only and drop the stub .show() routines from them. Original-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Report interrupt that caused system wakeupAlexandra Yates2015-09-161-0/+17
| | | | | | | | | | | | | | Add a sysfs attribute, /sys/power/pm_wakeup_irq, reporting the IRQ number of the first wakeup interrupt (that is, the first interrupt from an IRQ line armed for system wakeup) seen by the kernel during the most recent system suspend/resume cycle. This feature will be useful for system wakeup diagnostics of spurious wakeup interrupts. Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com> [ rjw: Fixed up pm_wakeup_irq definition ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Fix symbol name in a comment in kernel/power/main.cRafael J. Wysocki2015-05-131-1/+1
| | | | Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: add pm-trace support for suspending phaseZhonghui Fu2015-03-181-1/+1
| | | | | | | | | | | Occasionally, the system can't come back up after suspend/resume due to problems of device suspending phase. This patch make PM_TRACE infrastructure cover device suspending phase of suspend/resume process, and the information in RTC can tell developers which device suspending function make system hang. Signed-off-by: Zhonghui Fu <zhonghui.fu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*-. Merge branches 'pm-apm' and 'pm-sleep'Rafael J. Wysocki2014-07-271-10/+11
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | * pm-apm: x86, apm: Remove unused variable * pm-sleep: PM / sleep: Move platform suspend operations to separate functions PM / sleep: Simplify sleep states sysfs interface code
| | * PM / sleep: Simplify sleep states sysfs interface codeRafael J. Wysocki2014-07-211-10/+11
| |/ | | | | | | | | | | | | | | Simplify the sleep states sysfs interface /sys/power/state code by redefining pm_states[] as an array of pointers to constant strings such that only the entries corresponding to valid states are set. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* / PM: Create PM workqueue if runtime PM is not configured tooRafael J. Wysocki2014-07-231-4/+0
|/ | | | | | | | | | The PM workqueue is going to be used by ACPI PM notify handlers regardless of whether or not runtime PM is configured, so move it out of #ifdef CONFIG_PM_RUNTIME. Do that in three places in the ACPI device PM code. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / hibernate: introduce "nohibernate" boot parameterKees Cook2014-06-161-4/+2
| | | | | | | | | | | To support using kernel features that are not compatible with hibernation, this creates the "nohibernate" kernel boot parameter to disable both hibernation and resume. This allows hibernation support to be a boot-time choice instead of only a compile-time choice. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Introduce command line argument for sleep state enumerationRafael J. Wysocki2014-05-261-6/+6
| | | | | | | | | | | | | | | | | | | | | | On some systems the platform doesn't support neither PM_SUSPEND_MEM nor PM_SUSPEND_STANDBY, so PM_SUSPEND_FREEZE is the only available system sleep state. However, some user space frameworks only use the "mem" and (sometimes) "standby" sleep state labels, so the users of those systems need to modify user space in order to be able to use system suspend at all and that is not always possible. For this reason, add a new kernel command line argument, relative_sleep_states, allowing the users of those systems to change the way in which the kernel assigns labels to system sleep states. Namely, for relative_sleep_states=1, the "mem", "standby" and "freeze" labels will enumerate the available system sleem states from the deepest to the shallowest, respectively, so that "mem" is always present in /sys/power/state and the other state strings may or may not be presend depending on what is supported by the platform. Update system sleep states documentation to reflect this change. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Use valid_state() for platform-dependent sleep states onlyRafael J. Wysocki2014-05-261-4/+5
| | | | | | | | | | | | | | Use the observation that, for platform-dependent sleep states (PM_SUSPEND_STANDBY, PM_SUSPEND_MEM), a given state is either always supported or always unsupported and store that information in pm_states[] instead of calling valid_state() every time we need to check it. Also do not use valid_state() for PM_SUSPEND_FREEZE, which is always valid, and move the pm_test_level validity check for PM_SUSPEND_FREEZE directly into enter_state(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sleep: Add state field to pm_states[] entriesRafael J. Wysocki2014-05-261-8/+8
| | | | | | | | | | | | To allow sleep states corresponding to the "mem", "standby" and "freeze" lables to be different from the pm_states[] indexes of those strings, introduce struct pm_sleep_state, consisting of a string label and a state number, and turn pm_states[] into an array of objects of that type. This modification should not lead to any functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM: Add missing "freeze" stateGeert Uytterhoeven2014-03-121-2/+2
| | | | | | | | | | | Fix descriptions of /sys/power/state in the documentation and in a code comment. Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Pavel Machek <pavel@ucw.cz> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / Sleep: Warn about system time after resume with pm_traceShuah Khan2013-06-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | pm_trace uses the system's Real Time Clock (RTC) to save the magic number. The reason for this is that the RTC is the only reliably available piece of hardware during resume operations where a value can be set that will survive a reboot. Consequence is that after a resume (even if it is successful) your system clock will have a value corresponding to the magic number instead of the correct date/time! It is therefore advisable to use a program like ntp-date or rdate to reset the correct date/time from an external time source when using this trace option. There is no run-time message to warn users of the consequences of enabling pm_trace. Adding a warning message to pm_trace_store() will serve as a reminder to users to set the system date and time after resume. Signed-off-by: Shuah Khan <shuah.kh@samsung.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / Sleep: Print last wakeup source on failed wakeup_count writeJulius Werner2013-06-211-0/+2
| | | | | | | | | | | | | | | | | | | Commit a938da06 introduced a useful little log message to tell users/debuggers which wakeup source aborted a suspend. However, this message is only printed if the abort happens during the in-kernel suspend path (after writing /sys/power/state). The full specification of the /sys/power/wakeup_count facility allows user-space power managers to double-check if wakeups have already happened before it actually tries to suspend (e.g. while it was running user-space pre-suspend hooks), by writing the last known wakeup_count value to /sys/power/wakeup_count. This patch changes the sysfs handler for that node to also print said log message if that write fails, so that we can figure out the offending wakeup source for both kinds of suspend aborts. Signed-off-by: Julius Werner <jwerner@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* suspend: enable freeze timeout configuration through sysLi Fei2013-02-091-0/+27
| | | | | | | | | | | | | | | | | | | | | At present, the value of timeout for freezing is 20s, which is meaningless in case that one thread is frozen with mutex locked and another thread is trying to lock the mutex, as this time of freezing will fail unavoidably. And if there is no new wakeup event registered, the system will waste at most 20s for such meaningless trying of freezing. With this patch, the value of timeout can be configured to smaller value, so such meaningless trying of freezing will be aborted in earlier time, and later freezing can be also triggered in earlier time. And more power will be saved. In normal case on mobile phone, it costs real little time to freeze processes. On some platform, it only costs about 20ms to freeze user space processes and 10ms to freeze kernel freezable threads. Signed-off-by: Liu Chuansheng <chuansheng.liu@intel.com> Signed-off-by: Li Fei <fei.li@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM: Introduce suspend state PM_SUSPEND_FREEZEZhang Rui2013-02-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PM_SUSPEND_FREEZE state is a general state that does not need any platform specific support, it equals frozen processes + suspended devices + idle processors. Compared with PM_SUSPEND_MEMORY, PM_SUSPEND_FREEZE saves less power because the system is still in a running state. PM_SUSPEND_FREEZE has less resume latency because it does not touch BIOS, and the processors are in idle state. Compared with RTPM/idle, PM_SUSPEND_FREEZE saves more power as 1. the processor has longer sleep time because processes are frozen. The deeper c-state the processor supports, more power saving we can get. 2. PM_SUSPEND_FREEZE uses system suspend code path, thus we can get more power saving from the devices that does not have good RTPM support. This state is useful for 1) platforms that do not have STR, or have a broken STR. 2) platforms that have an extremely low power idle state, which can be used to replace STR. The following describes how PM_SUSPEND_FREEZE state works. 1. echo freeze > /sys/power/state 2. the processes are frozen. 3. all the devices are suspended. 4. all the processors are blocked by a wait queue 5. all the processors idles and enters (Deep) c-state. 6. an interrupt fires. 7. a processor is woken up and handles the irq. 8. if it is a general event, a) the irq handler runs and quites. b) goto step 4. 9. if it is a real wake event, say, power button pressing, keyboard touch, mouse moving, a) the irq handler runs and activate the wakeup source b) wakeup_source_activate() notifies the wait queue. c) system starts resuming from PM_SUSPEND_FREEZE 10. all the devices are resumed. 11. all the processes are unfrozen. 12. system is back to working. Known Issue: The wakeup of this new PM_SUSPEND_FREEZE state may behave differently from the previous suspend state. Take ACPI platform for example, there are some GPEs that only enabled when the system is in sleep state, to wake the system backk from S3/S4. But we are not touching these GPEs during transition to PM_SUSPEND_FREEZE. This means we may lose some wake event. But on the other hand, as we do not disable all the Interrupts during PM_SUSPEND_FREEZE, we may get some extra "wakeup" Interrupts, that are not available for S3/S4. The patches has been tested on an old Sony laptop, and here are the results: Average Power: 1. RPTM/idle for half an hour: 14.8W, 12.6W, 14.1W, 12.5W, 14.4W, 13.2W, 12.9W 2. Freeze for half an hour: 11W, 10.4W, 9.4W, 11.3W 10.5W 3. RTPM/idle for three hours: 11.6W 4. Freeze for three hours: 10W 5. Suspend to Memory: 0.5~0.9W Average Resume Latency: 1. RTPM/idle with a black screen: (From pressing keyboard to screen back) Less than 0.2s 2. Freeze: (From pressing power button to screen back) 2.50s 3. Suspend to Memory: (From pressing power button to screen back) 4.33s >From the results, we can see that all the platforms should benefit from this patch, even if it does not have Low Power S0. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / sysfs: replace strict_str* with kstrto*Daniel Walter2012-11-151-1/+1
| | | | | | | | | | | Replace strict_strtoul() with kstrtoul() in pm_async_store() and pm_qos_power_write(). [rjw: Modified subject and changelog.] Signed-off-by: Daniel Walter <sahne@0x90.at> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PM / Sleep: Separate printing suspend times from initcall_debugRafael J. Wysocki2012-07-011-32/+44
| | | | | | | | | | | | | | Change the behavior of the newly introduced /sys/power/pm_print_times attribute so that its initial value depends on initcall_debug, but setting it to 0 will cause device suspend/resume times not to be printed, even if initcall_debug has been set. This way, the people who use initcall_debug for reasons other than PM debugging will be able to switch the suspend/resume times printing off, if need be. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* PM / Sleep: add knob for printing device resume timesSameer Nanda2012-07-011-0/+33
| | | | | | | | | | | Added a new knob called /sys/power/pm_print_times. Setting it to 1 enables printing of time taken by devices to suspend and resume. Setting it to 0 disables this printing (unless overridden by initcall_debug kernel command line option). Signed-off-by: Sameer Nanda <snanda@chromium.org> Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM / Sleep: Fix a mistake in a conditional in autosleep_store()Arve Hjønnevåg2012-05-051-1/+1
| | | | | | | | | | The condition check in autosleep_store() is incorrect and prevents /sys/power/autosleep from working as advertised. Fix that. [rjw: Added the changelog.] Signed-off-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM / Sleep: Add user space interface for manipulating wakeup sources, v3Rafael J. Wysocki2012-05-011-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Android allows user space to manipulate wakelocks using two sysfs file located in /sys/power/, wake_lock and wake_unlock. Writing a wakelock name and optionally a timeout to the wake_lock file causes the wakelock whose name was written to be acquired (it is created before is necessary), optionally with the given timeout. Writing the name of a wakelock to wake_unlock causes that wakelock to be released. Implement an analogous interface for user space using wakeup sources. Add the /sys/power/wake_lock and /sys/power/wake_unlock files allowing user space to create, activate and deactivate wakeup sources, such that writing a name and optionally a timeout to wake_lock causes the wakeup source of that name to be activated, optionally with the given timeout. If that wakeup source doesn't exist, it will be created and then activated. Writing a name to wake_unlock causes the wakeup source of that name, if there is one, to be deactivated. Wakeup sources created with the help of wake_lock that haven't been used for more than 5 minutes are garbage collected and destroyed. Moreover, there can be only WL_NUMBER_LIMIT wakeup sources created with the help of wake_lock present at a time. The data type used to track wakeup sources created by user space is called "struct wakelock" to indicate the origins of this feature. This version of the patch includes an rbtree manipulation fix from John Stultz. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NeilBrown <neilb@suse.de>