summaryrefslogtreecommitdiffstats
path: root/Documentation/thermal
diff options
context:
space:
mode:
authorMauro Carvalho Chehab <mchehab+samsung@kernel.org>2019-07-26 09:51:12 -0300
committerJonathan Corbet <corbet@lwn.net>2019-07-31 13:25:15 -0600
commiteaf7b46083a7e341a23ab3d6042e0ccc115b0914 (patch)
tree86decb170d9376fca00dc22dcccd499465cd4aa7 /Documentation/thermal
parentfe13225fdc3f7b79e2921869a13386f48b30bf79 (diff)
downloadlinux-eaf7b46083a7e341a23ab3d6042e0ccc115b0914.tar.gz
linux-eaf7b46083a7e341a23ab3d6042e0ccc115b0914.tar.bz2
linux-eaf7b46083a7e341a23ab3d6042e0ccc115b0914.zip
docs: thermal: add it to the driver API
The file contents mostly describes driver internals. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/thermal')
-rw-r--r--Documentation/thermal/cpu-cooling-api.rst107
-rw-r--r--Documentation/thermal/exynos_thermal.rst90
-rw-r--r--Documentation/thermal/exynos_thermal_emulation.rst61
-rw-r--r--Documentation/thermal/index.rst18
-rw-r--r--Documentation/thermal/intel_powerclamp.rst320
-rw-r--r--Documentation/thermal/nouveau_thermal.rst96
-rw-r--r--Documentation/thermal/power_allocator.rst271
-rw-r--r--Documentation/thermal/sysfs-api.rst798
-rw-r--r--Documentation/thermal/x86_pkg_temperature_thermal.rst55
9 files changed, 0 insertions, 1816 deletions
diff --git a/Documentation/thermal/cpu-cooling-api.rst b/Documentation/thermal/cpu-cooling-api.rst
deleted file mode 100644
index 645d914c45a6..000000000000
--- a/Documentation/thermal/cpu-cooling-api.rst
+++ /dev/null
@@ -1,107 +0,0 @@
-=======================
-CPU cooling APIs How To
-=======================
-
-Written by Amit Daniel Kachhap <amit.kachhap@linaro.org>
-
-Updated: 6 Jan 2015
-
-Copyright (c) 2012 Samsung Electronics Co., Ltd(http://www.samsung.com)
-
-0. Introduction
-===============
-
-The generic cpu cooling(freq clipping) provides registration/unregistration APIs
-to the caller. The binding of the cooling devices to the trip point is left for
-the user. The registration APIs returns the cooling device pointer.
-
-1. cpu cooling APIs
-===================
-
-1.1 cpufreq registration/unregistration APIs
---------------------------------------------
-
- ::
-
- struct thermal_cooling_device
- *cpufreq_cooling_register(struct cpumask *clip_cpus)
-
- This interface function registers the cpufreq cooling device with the name
- "thermal-cpufreq-%x". This api can support multiple instances of cpufreq
- cooling devices.
-
- clip_cpus:
- cpumask of cpus where the frequency constraints will happen.
-
- ::
-
- struct thermal_cooling_device
- *of_cpufreq_cooling_register(struct cpufreq_policy *policy)
-
- This interface function registers the cpufreq cooling device with
- the name "thermal-cpufreq-%x" linking it with a device tree node, in
- order to bind it via the thermal DT code. This api can support multiple
- instances of cpufreq cooling devices.
-
- policy:
- CPUFreq policy.
-
-
- ::
-
- void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
-
- This interface function unregisters the "thermal-cpufreq-%x" cooling device.
-
- cdev: Cooling device pointer which has to be unregistered.
-
-2. Power models
-===============
-
-The power API registration functions provide a simple power model for
-CPUs. The current power is calculated as dynamic power (static power isn't
-supported currently). This power model requires that the operating-points of
-the CPUs are registered using the kernel's opp library and the
-`cpufreq_frequency_table` is assigned to the `struct device` of the
-cpu. If you are using CONFIG_CPUFREQ_DT then the
-`cpufreq_frequency_table` should already be assigned to the cpu
-device.
-
-The dynamic power consumption of a processor depends on many factors.
-For a given processor implementation the primary factors are:
-
-- The time the processor spends running, consuming dynamic power, as
- compared to the time in idle states where dynamic consumption is
- negligible. Herein we refer to this as 'utilisation'.
-- The voltage and frequency levels as a result of DVFS. The DVFS
- level is a dominant factor governing power consumption.
-- In running time the 'execution' behaviour (instruction types, memory
- access patterns and so forth) causes, in most cases, a second order
- variation. In pathological cases this variation can be significant,
- but typically it is of a much lesser impact than the factors above.
-
-A high level dynamic power consumption model may then be represented as::
-
- Pdyn = f(run) * Voltage^2 * Frequency * Utilisation
-
-f(run) here represents the described execution behaviour and its
-result has a units of Watts/Hz/Volt^2 (this often expressed in
-mW/MHz/uVolt^2)
-
-The detailed behaviour for f(run) could be modelled on-line. However,
-in practice, such an on-line model has dependencies on a number of
-implementation specific processor support and characterisation
-factors. Therefore, in initial implementation that contribution is
-represented as a constant coefficient. This is a simplification
-consistent with the relative contribution to overall power variation.
-
-In this simplified representation our model becomes::
-
- Pdyn = Capacitance * Voltage^2 * Frequency * Utilisation
-
-Where `capacitance` is a constant that represents an indicative
-running time dynamic power coefficient in fundamental units of
-mW/MHz/uVolt^2. Typical values for mobile CPUs might lie in range
-from 100 to 500. For reference, the approximate values for the SoC in
-ARM's Juno Development Platform are 530 for the Cortex-A57 cluster and
-140 for the Cortex-A53 cluster.
diff --git a/Documentation/thermal/exynos_thermal.rst b/Documentation/thermal/exynos_thermal.rst
deleted file mode 100644
index 5bd556566c70..000000000000
--- a/Documentation/thermal/exynos_thermal.rst
+++ /dev/null
@@ -1,90 +0,0 @@
-========================
-Kernel driver exynos_tmu
-========================
-
-Supported chips:
-
-* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC
-
- Datasheet: Not publicly available
-
-Authors: Donggeun Kim <dg77.kim@samsung.com>
-Authors: Amit Daniel <amit.daniel@samsung.com>
-
-TMU controller Description:
----------------------------
-
-This driver allows to read temperature inside SAMSUNG EXYNOS4/5 series of SoC.
-
-The chip only exposes the measured 8-bit temperature code value
-through a register.
-Temperature can be taken from the temperature code.
-There are three equations converting from temperature to temperature code.
-
-The three equations are:
- 1. Two point trimming::
-
- Tc = (T - 25) * (TI2 - TI1) / (85 - 25) + TI1
-
- 2. One point trimming::
-
- Tc = T + TI1 - 25
-
- 3. No trimming::
-
- Tc = T + 50
-
- Tc:
- Temperature code, T: Temperature,
- TI1:
- Trimming info for 25 degree Celsius (stored at TRIMINFO register)
- Temperature code measured at 25 degree Celsius which is unchanged
- TI2:
- Trimming info for 85 degree Celsius (stored at TRIMINFO register)
- Temperature code measured at 85 degree Celsius which is unchanged
-
-TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt
-when temperature exceeds pre-defined levels.
-The maximum number of configurable threshold is five.
-The threshold levels are defined as follows::
-
- Level_0: current temperature > trigger_level_0 + threshold
- Level_1: current temperature > trigger_level_1 + threshold
- Level_2: current temperature > trigger_level_2 + threshold
- Level_3: current temperature > trigger_level_3 + threshold
-
-The threshold and each trigger_level are set
-through the corresponding registers.
-
-When an interrupt occurs, this driver notify kernel thermal framework
-with the function exynos_report_trigger.
-Although an interrupt condition for level_0 can be set,
-it can be used to synchronize the cooling action.
-
-TMU driver description:
------------------------
-
-The exynos thermal driver is structured as::
-
- Kernel Core thermal framework
- (thermal_core.c, step_wise.c, cpu_cooling.c)
- ^
- |
- |
- TMU configuration data -----> TMU Driver <----> Exynos Core thermal wrapper
- (exynos_tmu_data.c) (exynos_tmu.c) (exynos_thermal_common.c)
- (exynos_tmu_data.h) (exynos_tmu.h) (exynos_thermal_common.h)
-
-a) TMU configuration data:
- This consist of TMU register offsets/bitfields
- described through structure exynos_tmu_registers. Also several
- other platform data (struct exynos_tmu_platform_data) members
- are used to configure the TMU.
-b) TMU driver:
- This component initialises the TMU controller and sets different
- thresholds. It invokes core thermal implementation with the call
- exynos_report_trigger.
-c) Exynos Core thermal wrapper:
- This provides 3 wrapper function to use the
- Kernel core thermal framework. They are exynos_unregister_thermal,
- exynos_register_thermal and exynos_report_trigger.
diff --git a/Documentation/thermal/exynos_thermal_emulation.rst b/Documentation/thermal/exynos_thermal_emulation.rst
deleted file mode 100644
index c21d10838bc5..000000000000
--- a/Documentation/thermal/exynos_thermal_emulation.rst
+++ /dev/null
@@ -1,61 +0,0 @@
-=====================
-Exynos Emulation Mode
-=====================
-
-Copyright (C) 2012 Samsung Electronics
-
-Written by Jonghwa Lee <jonghwa3.lee@samsung.com>
-
-Description
------------
-
-Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
-management unit. Thermal emulation mode supports software debug for
-TMU's operation. User can set temperature manually with software code
-and TMU will read current temperature from user value not from sensor's
-value.
-
-Enabling CONFIG_THERMAL_EMULATION option will make this support
-available. When it's enabled, sysfs node will be created as
-/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp.
-
-The sysfs node, 'emul_node', will contain value 0 for the initial state.
-When you input any temperature you want to update to sysfs node, it
-automatically enable emulation mode and current temperature will be
-changed into it.
-
-(Exynos also supports user changeable delay time which would be used to
-delay of changing temperature. However, this node only uses same delay
-of real sensing time, 938us.)
-
-Exynos emulation mode requires synchronous of value changing and
-enabling. It means when you want to update the any value of delay or
-next temperature, then you have to enable emulation mode at the same
-time. (Or you have to keep the mode enabling.) If you don't, it fails to
-change the value to updated one and just use last succeessful value
-repeatedly. That's why this node gives users the right to change
-termerpature only. Just one interface makes it more simply to use.
-
-Disabling emulation mode only requires writing value 0 to sysfs node.
-
-::
-
-
- TEMP 120 |
- |
- 100 |
- |
- 80 |
- | +-----------
- 60 | | |
- | +-------------| |
- 40 | | | |
- | | | |
- 20 | | | +----------
- | | | | |
- 0 |______________|_____________|__________|__________|_________
- A A A A TIME
- |<----->| |<----->| |<----->| |
- | 938us | | | | | |
- emulation : 0 50 | 70 | 20 | 0
- current temp: sensor 50 70 20 sensor
diff --git a/Documentation/thermal/index.rst b/Documentation/thermal/index.rst
deleted file mode 100644
index 8c1c00146cad..000000000000
--- a/Documentation/thermal/index.rst
+++ /dev/null
@@ -1,18 +0,0 @@
-:orphan:
-
-=======
-Thermal
-=======
-
-.. toctree::
- :maxdepth: 1
-
- cpu-cooling-api
- sysfs-api
- power_allocator
-
- exynos_thermal
- exynos_thermal_emulation
- intel_powerclamp
- nouveau_thermal
- x86_pkg_temperature_thermal
diff --git a/Documentation/thermal/intel_powerclamp.rst b/Documentation/thermal/intel_powerclamp.rst
deleted file mode 100644
index 3f6dfb0b3ea6..000000000000
--- a/Documentation/thermal/intel_powerclamp.rst
+++ /dev/null
@@ -1,320 +0,0 @@
-=======================
-Intel Powerclamp Driver
-=======================
-
-By:
- - Arjan van de Ven <arjan@linux.intel.com>
- - Jacob Pan <jacob.jun.pan@linux.intel.com>
-
-.. Contents:
-
- (*) Introduction
- - Goals and Objectives
-
- (*) Theory of Operation
- - Idle Injection
- - Calibration
-
- (*) Performance Analysis
- - Effectiveness and Limitations
- - Power vs Performance
- - Scalability
- - Calibration
- - Comparison with Alternative Techniques
-
- (*) Usage and Interfaces
- - Generic Thermal Layer (sysfs)
- - Kernel APIs (TBD)
-
-INTRODUCTION
-============
-
-Consider the situation where a system’s power consumption must be
-reduced at runtime, due to power budget, thermal constraint, or noise
-level, and where active cooling is not preferred. Software managed
-passive power reduction must be performed to prevent the hardware
-actions that are designed for catastrophic scenarios.
-
-Currently, P-states, T-states (clock modulation), and CPU offlining
-are used for CPU throttling.
-
-On Intel CPUs, C-states provide effective power reduction, but so far
-they’re only used opportunistically, based on workload. With the
-development of intel_powerclamp driver, the method of synchronizing
-idle injection across all online CPU threads was introduced. The goal
-is to achieve forced and controllable C-state residency.
-
-Test/Analysis has been made in the areas of power, performance,
-scalability, and user experience. In many cases, clear advantage is
-shown over taking the CPU offline or modulating the CPU clock.
-
-
-THEORY OF OPERATION
-===================
-
-Idle Injection
---------------
-
-On modern Intel processors (Nehalem or later), package level C-state
-residency is available in MSRs, thus also available to the kernel.
-
-These MSRs are::
-
- #define MSR_PKG_C2_RESIDENCY 0x60D
- #define MSR_PKG_C3_RESIDENCY 0x3F8
- #define MSR_PKG_C6_RESIDENCY 0x3F9
- #define MSR_PKG_C7_RESIDENCY 0x3FA
-
-If the kernel can also inject idle time to the system, then a
-closed-loop control system can be established that manages package
-level C-state. The intel_powerclamp driver is conceived as such a
-control system, where the target set point is a user-selected idle
-ratio (based on power reduction), and the error is the difference
-between the actual package level C-state residency ratio and the target idle
-ratio.
-
-Injection is controlled by high priority kernel threads, spawned for
-each online CPU.
-
-These kernel threads, with SCHED_FIFO class, are created to perform
-clamping actions of controlled duty ratio and duration. Each per-CPU
-thread synchronizes its idle time and duration, based on the rounding
-of jiffies, so accumulated errors can be prevented to avoid a jittery
-effect. Threads are also bound to the CPU such that they cannot be
-migrated, unless the CPU is taken offline. In this case, threads
-belong to the offlined CPUs will be terminated immediately.
-
-Running as SCHED_FIFO and relatively high priority, also allows such
-scheme to work for both preemptable and non-preemptable kernels.
-Alignment of idle time around jiffies ensures scalability for HZ
-values. This effect can be better visualized using a Perf timechart.
-The following diagram shows the behavior of kernel thread
-kidle_inject/cpu. During idle injection, it runs monitor/mwait idle
-for a given "duration", then relinquishes the CPU to other tasks,
-until the next time interval.
-
-The NOHZ schedule tick is disabled during idle time, but interrupts
-are not masked. Tests show that the extra wakeups from scheduler tick
-have a dramatic impact on the effectiveness of the powerclamp driver
-on large scale systems (Westmere system with 80 processors).
-
-::
-
- CPU0
- ____________ ____________
- kidle_inject/0 | sleep | mwait | sleep |
- _________| |________| |_______
- duration
- CPU1
- ____________ ____________
- kidle_inject/1 | sleep | mwait | sleep |
- _________| |________| |_______
- ^
- |
- |
- roundup(jiffies, interval)
-
-Only one CPU is allowed to collect statistics and update global
-control parameters. This CPU is referred to as the controlling CPU in
-this document. The controlling CPU is elected at runtime, with a
-policy that favors BSP, taking into account the possibility of a CPU
-hot-plug.
-
-In terms of dynamics of the idle control system, package level idle
-time is considered largely as a non-causal system where its behavior
-cannot be based on the past or current input. Therefore, the
-intel_powerclamp driver attempts to enforce the desired idle time
-instantly as given input (target idle ratio). After injection,
-powerclamp monitors the actual idle for a given time window and adjust
-the next injection accordingly to avoid over/under correction.
-
-When used in a causal control system, such as a temperature control,
-it is up to the user of this driver to implement algorithms where
-past samples and outputs are included in the feedback. For example, a
-PID-based thermal controller can use the powerclamp driver to
-maintain a desired target temperature, based on integral and
-derivative gains of the past samples.
-
-
-
-Calibration
------------
-During scalability testing, it is observed that synchronized actions
-among CPUs become challenging as the number of cores grows. This is
-also true for the ability of a system to enter package level C-states.
-
-To make sure the intel_powerclamp driver scales well, online
-calibration is implemented. The goals for doing such a calibration
-are:
-
-a) determine the effective range of idle injection ratio
-b) determine the amount of compensation needed at each target ratio
-
-Compensation to each target ratio consists of two parts:
-
- a) steady state error compensation
- This is to offset the error occurring when the system can
- enter idle without extra wakeups (such as external interrupts).
-
- b) dynamic error compensation
- When an excessive amount of wakeups occurs during idle, an
- additional idle ratio can be added to quiet interrupts, by
- slowing down CPU activities.
-
-A debugfs file is provided for the user to examine compensation
-progress and results, such as on a Westmere system::
-
- [jacob@nex01 ~]$ cat
- /sys/kernel/debug/intel_powerclamp/powerclamp_calib
- controlling cpu: 0
- pct confidence steady dynamic (compensation)
- 0 0 0 0
- 1 1 0 0
- 2 1 1 0
- 3 3 1 0
- 4 3 1 0
- 5 3 1 0
- 6 3 1 0
- 7 3 1 0
- 8 3 1 0
- ...
- 30 3 2 0
- 31 3 2 0
- 32 3 1 0
- 33 3 2 0
- 34 3 1 0
- 35 3 2 0
- 36 3 1 0
- 37 3 2 0
- 38 3 1 0
- 39 3 2 0
- 40 3 3 0
- 41 3 1 0
- 42 3 2 0
- 43 3 1 0
- 44 3 1 0
- 45 3 2 0
- 46 3 3 0
- 47 3 0 0
- 48 3 2 0
- 49 3 3 0
-
-Calibration occurs during runtime. No offline method is available.
-Steady state compensation is used only when confidence levels of all
-adjacent ratios have reached satisfactory level. A confidence level
-is accumulated based on clean data collected at runtime. Data
-collected during a period without extra interrupts is considered
-clean.
-
-To compensate for excessive amounts of wakeup during idle, additional
-idle time is injected when such a condition is detected. Currently,
-we have a simple algorithm to double the injection ratio. A possible
-enhancement might be to throttle the offending IRQ, such as delaying
-EOI for level triggered interrupts. But it is a challenge to be
-non-intrusive to the scheduler or the IRQ core code.
-
-
-CPU Online/Offline
-------------------
-Per-CPU kernel threads are started/stopped upon receiving
-notifications of CPU hotplug activities. The intel_powerclamp driver
-keeps track of clamping kernel threads, even after they are migrated
-to other CPUs, after a CPU offline event.
-
-
-Performance Analysis
-====================
-This section describes the general performance data collected on
-multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).
-
-Effectiveness and Limitations
------------------------------
-The maximum range that idle injection is allowed is capped at 50
-percent. As mentioned earlier, since interrupts are allowed during
-forced idle time, excessive interrupts could result in less
-effectiveness. The extreme case would be doing a ping -f to generated
-flooded network interrupts without much CPU acknowledgement. In this
-case, little can be done from the idle injection threads. In most
-normal cases, such as scp a large file, applications can be throttled
-by the powerclamp driver, since slowing down the CPU also slows down
-network protocol processing, which in turn reduces interrupts.
-
-When control parameters change at runtime by the controlling CPU, it
-may take an additional period for the rest of the CPUs to catch up
-with the changes. During this time, idle injection is out of sync,
-thus not able to enter package C- states at the expected ratio. But
-this effect is minor, in that in most cases change to the target
-ratio is updated much less frequently than the idle injection
-frequency.
-
-Scalability
------------
-Tests also show a minor, but measurable, difference between the 4P/8P
-Ivy Bridge system and the 80P Westmere server under 50% idle ratio.
-More compensation is needed on Westmere for the same amount of
-target idle ratio. The compensation also increases as the idle ratio
-gets larger. The above reason constitutes the need for the
-calibration code.
-
-On the IVB 8P system, compared to an offline CPU, powerclamp can
-achieve up to 40% better performance per watt. (measured by a spin
-counter summed over per CPU counting threads spawned for all running
-CPUs).
-
-Usage and Interfaces
-====================
-The powerclamp driver is registered to the generic thermal layer as a
-cooling device. Currently, it’s not bound to any thermal zones::
-
- jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
- cur_state:0
- max_state:50
- type:intel_powerclamp
-
-cur_state allows user to set the desired idle percentage. Writing 0 to
-cur_state will stop idle injection. Writing a value between 1 and
-max_state will start the idle injection. Reading cur_state returns the
-actual and current idle percentage. This may not be the same value
-set by the user in that current idle percentage depends on workload
-and includes natural idle. When idle injection is disabled, reading
-cur_state returns value -1 instead of 0 which is to avoid confusing
-100% busy state with the disabled state.
-
-Example usage:
-- To inject 25% idle time::
-
- $ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
-
-If the system is not busy and has more than 25% idle time already,
-then the powerclamp driver will not start idle injection. Using Top
-will not show idle injection kernel threads.
-
-If the system is busy (spin test below) and has less than 25% natural
-idle time, powerclamp kernel threads will do idle injection. Forced
-idle time is accounted as normal idle in that common code path is
-taken as the idle task.
-
-In this example, 24.1% idle is shown. This helps the system admin or
-user determine the cause of slowdown, when a powerclamp driver is in action::
-
-
- Tasks: 197 total, 1 running, 196 sleeping, 0 stopped, 0 zombie
- Cpu(s): 71.2%us, 4.7%sy, 0.0%ni, 24.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
- Mem: 3943228k total, 1689632k used, 2253596k free, 74960k buffers
- Swap: 4087804k total, 0k used, 4087804k free, 945336k cached
-
- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
- 3352 jacob 20 0 262m 644 428 S 286 0.0 0:17.16 spin
- 3341 root -51 0 0 0 0 D 25 0.0 0:01.62 kidle_inject/0
- 3344 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/3
- 3342 root -51 0 0 0 0 D 25 0.0 0:01.61 kidle_inject/1
- 3343 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/2
- 2935 jacob 20 0 696m 125m 35m S 5 3.3 0:31.11 firefox
- 1546 root 20 0 158m 20m 6640 S 3 0.5 0:26.97 Xorg
- 2100 jacob 20 0 1223m 88m 30m S 3 2.3 0:23.68 compiz
-
-Tests have shown that by using the powerclamp driver as a cooling
-device, a PID based userspace thermal controller can manage to
-control CPU temperature effectively, when no other thermal influence
-is added. For example, a UltraBook user can compile the kernel under
-certain temperature (below most active trip points).
diff --git a/Documentation/thermal/nouveau_thermal.rst b/Documentation/thermal/nouveau_thermal.rst
deleted file mode 100644
index 37255fd6735d..000000000000
--- a/Documentation/thermal/nouveau_thermal.rst
+++ /dev/null
@@ -1,96 +0,0 @@
-=====================
-Kernel driver nouveau
-=====================
-
-Supported chips:
-
-* NV43+
-
-Authors: Martin Peres (mupuf) <martin.peres@free.fr>
-
-Description
------------
-
-This driver allows to read the GPU core temperature, drive the GPU fan and
-set temperature alarms.
-
-Currently, due to the absence of in-kernel API to access HWMON drivers, Nouveau
-cannot access any of the i2c external monitoring chips it may find. If you
-have one of those, temperature and/or fan management through Nouveau's HWMON
-interface is likely not to work. This document may then not cover your situation
-entirely.
-
-Temperature management
-----------------------
-
-Temperature is exposed under as a read-only HWMON attribute temp1_input.
-
-In order to protect the GPU from overheating, Nouveau supports 4 configurable
-temperature thresholds:
-
- * Fan_boost:
- Fan speed is set to 100% when reaching this temperature;
- * Downclock:
- The GPU will be downclocked to reduce its power dissipation;
- * Critical:
- The GPU is put on hold to further lower power dissipation;
- * Shutdown:
- Shut the computer down to protect your GPU.
-
-WARNING:
- Some of these thresholds may not be used by Nouveau depending
- on your chipset.
-
-The default value for these thresholds comes from the GPU's vbios. These
-thresholds can be configured thanks to the following HWMON attributes:
-
- * Fan_boost: temp1_auto_point1_temp and temp1_auto_point1_temp_hyst;
- * Downclock: temp1_max and temp1_max_hyst;
- * Critical: temp1_crit and temp1_crit_hyst;
- * Shutdown: temp1_emergency and temp1_emergency_hyst.
-
-NOTE: Remember that the values are stored as milli degrees Celsius. Don't forget
-to multiply!
-
-Fan management
---------------
-
-Not all cards have a drivable fan. If you do, then the following HWMON
-attributes should be available:
-
- * pwm1_enable:
- Current fan management mode (NONE, MANUAL or AUTO);
- * pwm1:
- Current PWM value (power percentage);
- * pwm1_min:
- The minimum PWM speed allowed;
- * pwm1_max:
- The maximum PWM speed allowed (bypassed when hitting Fan_boost);
-
-You may also have the following attribute:
-
- * fan1_input:
- Speed in RPM of your fan.
-
-Your fan can be driven in different modes:
-
- * 0: The fan is left untouched;
- * 1: The fan can be driven in manual (use pwm1 to change the speed);
- * 2; The fan is driven automatically depending on the temperature.
-
-NOTE:
- Be sure to use the manual mode if you want to drive the fan speed manually
-
-NOTE2:
- When operating in manual mode outside the vbios-defined
- [PWM_min, PWM_max] range, the reported fan speed (RPM) may not be accurate
- depending on your hardware.
-
-Bug reports
------------
-
-Thermal management on Nouveau is new and may not work on all cards. If you have
-inquiries, please ping mupuf on IRC (#nouveau, freenode).
-
-Bug reports should be filled on Freedesktop's bug tracker. Please follow
-http://nouveau.freedesktop.org/wiki/Bugs
diff --git a/Documentation/thermal/power_allocator.rst b/Documentation/thermal/power_allocator.rst
deleted file mode 100644
index 67b6a3297238..000000000000
--- a/Documentation/thermal/power_allocator.rst
+++ /dev/null
@@ -1,271 +0,0 @@
-=================================
-Power allocator governor tunables
-=================================
-
-Trip points
------------
-
-The governor works optimally with the following two passive trip points:
-
-1. "switch on" trip point: temperature above which the governor
- control loop starts operating. This is the first passive trip
- point of the thermal zone.
-
-2. "desired temperature" trip point: it should be higher than the
- "switch on" trip point. This the target temperature the governor
- is controlling for. This is the last passive trip point of the
- thermal zone.
-
-PID Controller
---------------
-
-The power allocator governor implements a
-Proportional-Integral-Derivative controller (PID controller) with
-temperature as the control input and power as the controlled output:
-
- P_max = k_p * e + k_i * err_integral + k_d * diff_err + sustainable_power
-
-where
- - e = desired_temperature - current_temperature
- - err_integral is the sum of previous errors
- - diff_err = e - previous_error
-
-It is similar to the one depicted below::
-
- k_d
- |
- current_temp |
- | v
- | +----------+ +---+
- | +----->| diff_err |-->| X |------+
- | | +----------+ +---+ |
- | | | tdp actor
- | | k_i | | get_requested_power()
- | | | | | | |
- | | | | | | | ...
- v | v v v v v
- +---+ | +-------+ +---+ +---+ +---+ +----------+
- | S |-----+----->| sum e |----->| X |--->| S |-->| S |-->|power |
- +---+ | +-------+ +---+ +---+ +---+ |allocation|
- ^ | ^ +----------+
- | | | | |
- | | +---+ | | |
- | +------->| X |-------------------+ v v
- | +---+ granted performance
- desired_temperature ^
- |
- |
- k_po/k_pu
-
-Sustainable power
------------------
-
-An estimate of the sustainable dissipatable power (in mW) should be
-provided while registering the thermal zone. This estimates the
-sustained power that can be dissipated at the desired control
-temperature. This is the maximum sustained power for allocation at
-the desired maximum temperature. The actual sustained power can vary
-for a number of reasons. The closed loop controller will take care of
-variations such as environmental conditions, and some factors related
-to the speed-grade of the silicon. `sustainable_power` is therefore
-simply an estimate, and may be tuned to affect the aggressiveness of
-the thermal ramp. For reference, the sustainable power of a 4" phone
-is typically 2000mW, while on a 10" tablet is around 4500mW (may vary
-depending on screen size).
-
-If you are using device tree, do add it as a property of the
-thermal-zone. For example::
-
- thermal-zones {
- soc_thermal {
- polling-delay = <1000>;
- polling-delay-passive = <100>;
- sustainable-power = <2500>;
- ...
-
-Instead, if the thermal zone is registered from the platform code, pass a
-`thermal_zone_params` that has a `sustainable_power`. If no
-`thermal_zone_params` were being passed, then something like below
-will suffice::
-
- static const struct thermal_zone_params tz_params = {
- .sustainable_power = 3500,
- };
-
-and then pass `tz_params` as the 5th parameter to
-`thermal_zone_device_register()`
-
-k_po and k_pu
--------------
-
-The implementation of the PID controller in the power allocator
-thermal governor allows the configuration of two proportional term
-constants: `k_po` and `k_pu`. `k_po` is the proportional term
-constant during temperature overshoot periods (current temperature is
-above "desired temperature" trip point). Conversely, `k_pu` is the
-proportional term constant during temperature undershoot periods
-(current temperature below "desired temperature" trip point).
-
-These controls are intended as the primary mechanism for configuring
-the permitted thermal "ramp" of the system. For instance, a lower
-`k_pu` value will provide a slower ramp, at the cost of capping
-available capacity at a low temperature. On the other hand, a high
-value of `k_pu` will result in the governor granting very high power
-while temperature is low, and may lead to temperature overshooting.
-
-The default value for `k_pu` is::
-
- 2 * sustainable_power / (desired_temperature - switch_on_temp)
-
-This means that at `switch_on_temp` the output of the controller's
-proportional term will be 2 * `sustainable_power`. The default value
-for `k_po` is::
-
- sustainable_power / (desired_temperature - switch_on_temp)
-
-Focusing on the proportional and feed forward values of the PID
-controller equation we have::
-
- P_max = k_p * e + sustainable_power
-
-The proportional term is proportional to the difference between the
-desired temperature and the current one. When the current temperature
-is the desired one, then the proportional component is zero and
-`P_max` = `sustainable_power`. That is, the system should operate in
-thermal equilibrium under constant load. `sustainable_power` is only
-an estimate, which is the reason for closed-loop control such as this.
-
-Expanding `k_pu` we get::
-
- P_max = 2 * sustainable_power * (T_set - T) / (T_set - T_on) +
- sustainable_power
-
-where:
-
- - T_set is the desired temperature
- - T is the current temperature
- - T_on is the switch on temperature
-
-When the current temperature is the switch_on temperature, the above
-formula becomes::
-
- P_max = 2 * sustainable_power * (T_set - T_on) / (T_set - T_on) +
- sustainable_power = 2 * sustainable_power + sustainable_power =
- 3 * sustainable_power
-
-Therefore, the proportional term alone linearly decreases power from
-3 * `sustainable_power` to `sustainable_power` as the temperature
-rises from the switch on temperature to the desired temperature.
-
-k_i and integral_cutoff
------------------------
-
-`k_i` configures the PID loop's integral term constant. This term
-allows the PID controller to compensate for long term drift and for
-the quantized nature of the output control: cooling devices can't set
-the exact power that the governor requests. When the temperature
-error is below `integral_cutoff`, errors are accumulated in the
-integral term. This term is then multiplied by `k_i` and the result
-added to the output of the controller. Typically `k_i` is set low (1
-or 2) and `integral_cutoff` is 0.
-
-k_d
----
-
-`k_d` configures the PID loop's derivative term constant. It's
-recommended to leave it as the default: 0.
-
-Cooling device power API
-========================
-
-Cooling devices controlled by this governor must supply the additional
-"power" API in their `cooling_device_ops`. It consists on three ops:
-
-1. ::
-
- int get_requested_power(struct thermal_cooling_device *cdev,
- struct thermal_zone_device *tz, u32 *power);
-
-
-@cdev:
- The `struct thermal_cooling_device` pointer
-@tz:
- thermal zone in which we are currently operating
-@power:
- pointer in which to store the calculated power
-
-`get_requested_power()` calculates the power requested by the device
-in milliwatts and stores it in @power . It should return 0 on
-success, -E* on failure. This is currently used by the power
-allocator governor to calculate how much power to give to each cooling
-device.
-
-2. ::
-
- int state2power(struct thermal_cooling_device *cdev, struct
- thermal_zone_device *tz, unsigned long state,
- u32 *power);
-
-@cdev:
- The `struct thermal_cooling_device` pointer
-@tz:
- thermal zone in which we are currently operating
-@state:
- A cooling device state
-@power:
- pointer in which to store the equivalent power
-
-Convert cooling device state @state into power consumption in
-milliwatts and store it in @power. It should return 0 on success, -E*
-on failure. This is currently used by thermal core to calculate the
-maximum power that an actor can consume.
-
-3. ::
-
- int power2state(struct thermal_cooling_device *cdev, u32 power,
- unsigned long *state);
-
-@cdev:
- The `struct thermal_cooling_device` pointer
-@power:
- power in milliwatts
-@state:
- pointer in which to store the resulting state
-
-Calculate a cooling device state that would make the device consume at
-most @power mW and store it in @state. It should return 0 on success,
--E* on failure. This is currently used by the thermal core to convert
-a given power set by the power allocator governor to a state that the
-cooling device can set. It is a function because this conversion may
-depend on external factors that may change so this function should the
-best conversion given "current circumstances".
-
-Cooling device weights
-----------------------
-
-Weights are a mechanism to bias the allocation among cooling
-devices. They express the relative power efficiency of different
-cooling devices. Higher weight can be used to express higher power
-efficiency. Weighting is relative such that if each cooling device
-has a weight of one they are considered equal. This is particularly
-useful in heterogeneous systems where two cooling devices may perform
-the same kind of compute, but with different efficiency. For example,
-a system with two different types of processors.
-
-If the thermal zone is registered using
-`thermal_zone_device_register()` (i.e., platform code), then weights
-are passed as part of the thermal zone's `thermal_bind_parameters`.
-If the platform is registered using device tree, then they are passed
-as the `contribution` property of each map in the `cooling-maps` node.
-
-Limitations of the power allocator governor
-===========================================
-
-The power allocator governor's PID controller works best if there is a
-periodic tick. If you have a driver that calls
-`thermal_zone_device_update()` (or anything that ends up calling the
-governor's `throttle()` function) repetitively, the governor response
-won't be very good. Note that this is not particular to this
-governor, step-wise will also misbehave if you call its throttle()
-faster than the normal thermal framework tick (due to interrupts for
-example) as it will overreact.
diff --git a/Documentation/thermal/sysfs-api.rst b/Documentation/thermal/sysfs-api.rst
deleted file mode 100644
index e4930761d3e5..000000000000
--- a/Documentation/thermal/sysfs-api.rst
+++ /dev/null
@@ -1,798 +0,0 @@
-===================================
-Generic Thermal Sysfs driver How To
-===================================
-
-Written by Sujith Thomas <sujith.thomas@intel.com>, Zhang Rui <rui.zhang@intel.com>
-
-Updated: 2 January 2008
-
-Copyright (c) 2008 Intel Corporation
-
-
-0. Introduction
-===============
-
-The generic thermal sysfs provides a set of interfaces for thermal zone
-devices (sensors) and thermal cooling devices (fan, processor...) to register
-with the thermal management solution and to be a part of it.
-
-This how-to focuses on enabling new thermal zone and cooling devices to
-participate in thermal management.
-This solution is platform independent and any type of thermal zone devices
-and cooling devices should be able to make use of the infrastructure.
-
-The main task of the thermal sysfs driver is to expose thermal zone attributes
-as well as cooling device attributes to the user space.
-An intelligent thermal management application can make decisions based on
-inputs from thermal zone attributes (the current temperature and trip point
-temperature) and throttle appropriate devices.
-
-- `[0-*]` denotes any positive number starting from 0
-- `[1-*]` denotes any positive number starting from 1
-
-1. thermal sysfs driver interface functions
-===========================================
-
-1.1 thermal zone device interface
----------------------------------
-
- ::
-
- struct thermal_zone_device
- *thermal_zone_device_register(char *type,
- int trips, int mask, void *devdata,
- struct thermal_zone_device_ops *ops,
- const struct thermal_zone_params *tzp,
- int passive_delay, int polling_delay))
-
- This interface function adds a new thermal zone device (sensor) to
- /sys/class/thermal folder as `thermal_zone[0-*]`. It tries to bind all the
- thermal cooling devices registered at the same time.
-
- type:
- the thermal zone type.
- trips:
- the total number of trip points this thermal zone supports.
- mask:
- Bit string: If 'n'th bit is set, then trip point 'n' is writeable.
- devdata:
- device private data
- ops:
- thermal zone device call-backs.
-
- .bind:
- bind the thermal zone device with a thermal cooling device.
- .unbind:
- unbind the thermal zone device with a thermal cooling device.
- .get_temp:
- get the current temperature of the thermal zone.
- .set_trips:
- set the trip points window. Whenever the current temperature
- is updated, the trip points immediately below and above the
- current temperature are found.
- .get_mode:
- get the current mode (enabled/disabled) of the thermal zone.
-
- - "enabled" means the kernel thermal management is
- enabled.
- - "disabled" will prevent kernel thermal driver action
- upon trip points so that user applications can take
- charge of thermal management.
- .set_mode:
- set the mode (enabled/disabled) of the thermal zone.
- .get_trip_type:
- get the type of certain trip point.
- .get_trip_temp:
- get the temperature above which the certain trip point
- will be fired.
- .set_emul_temp:
- set the emulation temperature which helps in debugging
- different threshold temperature points.
- tzp:
- thermal zone platform parameters.
- passive_delay:
- number of milliseconds to wait between polls when
- performing passive cooling.
- polling_delay:
- number of milliseconds to wait between polls when checking
- whether trip points have been crossed (0 for interrupt driven systems).
-
- ::
-
- void thermal_zone_device_unregister(struct thermal_zone_device *tz)
-
- This interface function removes the thermal zone device.
- It deletes the corresponding entry from /sys/class/thermal folder and
- unbinds all the thermal cooling devices it uses.
-
- ::
-
- struct thermal_zone_device
- *thermal_zone_of_sensor_register(struct device *dev, int sensor_id,
- void *data,
- const struct thermal_zone_of_device_ops *ops)
-
- This interface adds a new sensor to a DT thermal zone.
- This function will search the list of thermal zones described in
- device tree and look for the zone that refer to the sensor device
- pointed by dev->of_node as temperature providers. For the zone
- pointing to the sensor node, the sensor will be added to the DT
- thermal zone device.
-
- The parameters for this interface are:
-
- dev:
- Device node of sensor containing valid node pointer in
- dev->of_node.
- sensor_id:
- a sensor identifier, in case the sensor IP has more
- than one sensors
- data:
- a private pointer (owned by the caller) that will be
- passed back, when a temperature reading is needed.
- ops:
- `struct thermal_zone_of_device_ops *`.
-
- ============== =======================================
- get_temp a pointer to a function that reads the
- sensor temperature. This is mandatory
- callback provided by sensor driver.
- set_trips a pointer to a function that sets a
- temperature window. When this window is
- left the driver must inform the thermal
- core via thermal_zone_device_update.
- get_trend a pointer to a function that reads the
- sensor temperature trend.
- set_emul_temp a pointer to a function that sets
- sensor emulated temperature.
- ============== =======================================
-
- The thermal zone temperature is provided by the get_temp() function
- pointer of thermal_zone_of_device_ops. When called, it will
- have the private pointer @data back.
-
- It returns error pointer if fails otherwise valid thermal zone device
- handle. Caller should check the return handle with IS_ERR() for finding
- whether success or not.
-
- ::
-
- void thermal_zone_of_sensor_unregister(struct device *dev,
- struct thermal_zone_device *tzd)
-
- This interface unregisters a sensor from a DT thermal zone which was
- successfully added by interface thermal_zone_of_sensor_register().
- This function removes the sensor callbacks and private data from the
- thermal zone device registered with thermal_zone_of_sensor_register()
- interface. It will also silent the zone by remove the .get_temp() and
- get_trend() thermal zone device callbacks.
-
- ::
-
- struct thermal_zone_device
- *devm_thermal_zone_of_sensor_register(struct device *dev,
- int sensor_id,
- void *data,
- const struct thermal_zone_of_device_ops *ops)
-
- This interface is resource managed version of
- thermal_zone_of_sensor_register().
-
- All details of thermal_zone_of_sensor_register() described in
- section 1.1.3 is applicable here.
-
- The benefit of using this interface to register sensor is that it
- is not require to explicitly call thermal_zone_of_sensor_unregister()
- in error path or during driver unbinding as this is done by driver
- resource manager.
-
- ::
-
- void devm_thermal_zone_of_sensor_unregister(struct device *dev,
- struct thermal_zone_device *tzd)
-
- This interface is resource managed version of
- thermal_zone_of_sensor_unregister().
- All details of thermal_zone_of_sensor_unregister() described in
- section 1.1.4 is applicable here.
- Normally this function will not need to be called and the resource
- management code will ensure that the resource is freed.
-
- ::
-
- int thermal_zone_get_slope(struct thermal_zone_device *tz)
-
- This interface is used to read the slope attribute value
- for the thermal zone device, which might be useful for platform
- drivers for temperature calculations.
-
- ::
-
- int thermal_zone_get_offset(struct thermal_zone_device *tz)
-
- This interface is used to read the offset attribute value
- for the thermal zone device, which might be useful for platform
- drivers for temperature calculations.
-
-1.2 thermal cooling device interface
-------------------------------------
-
-
- ::
-
- struct thermal_cooling_device
- *thermal_cooling_device_register(char *name,
- void *devdata, struct thermal_cooling_device_ops *)
-
- This interface function adds a new thermal cooling device (fan/processor/...)
- to /sys/class/thermal/ folder as `cooling_device[0-*]`. It tries to bind itself
- to all the thermal zone devices registered at the same time.
-
- name:
- the cooling device name.
- devdata:
- device private data.
- ops:
- thermal cooling devices call-backs.
-
- .get_max_state:
- get the Maximum throttle state of the cooling device.
- .get_cur_state:
- get the Currently requested throttle state of the
- cooling device.
- .set_cur_state:
- set the Current throttle state of the cooling device.
-
- ::
-
- void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
-
- This interface function removes the thermal cooling device.
- It deletes the corresponding entry from /sys/class/thermal folder and
- unbinds itself from all the thermal zone devices using it.
-
-1.3 interface for binding a thermal zone device with a thermal cooling device
------------------------------------------------------------------------------
-
- ::
-
- int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
- int trip, struct thermal_cooling_device *cdev,
- unsigned long upper, unsigned long lower, unsigned int weight);
-
- This interface function binds a thermal cooling device to a particular trip
- point of a thermal zone device.
-
- This function is usually called in the thermal zone device .bind callback.
-
- tz:
- the thermal zone device
- cdev:
- thermal cooling device
- trip:
- indicates which trip point in this thermal zone the cooling device
- is associated with.
- upper:
- the Maximum cooling state for this trip point.
- THERMAL_NO_LIMIT means no upper limit,
- and the cooling device can be in max_state.
- lower:
- the Minimum cooling state can be used for this trip point.
- THERMAL_NO_LIMIT means no lower limit,
- and the cooling device can be in cooling state 0.
- weight:
- the influence of this cooling device in this thermal
- zone. See 1.4.1 below for more information.
-
- ::
-
- int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
- int trip, struct thermal_cooling_device *cdev);
-
- This interface function unbinds a thermal cooling device from a particular
- trip point of a thermal zone device. This function is usually called in
- the thermal zone device .unbind callback.
-
- tz:
- the thermal zone device
- cdev:
- thermal cooling device
- trip:
- indicates which trip point in this thermal zone the cooling device
- is associated with.
-
-1.4 Thermal Zone Parameters
----------------------------
-
- ::
-
- struct thermal_bind_params
-
- This structure defines the following parameters that are used to bind
- a zone with a cooling device for a particular trip point.
-
- .cdev:
- The cooling device pointer
- .weight:
- The 'influence' of a particular cooling device on this
- zone. This is relative to the rest of the cooling
- devices. For example, if all cooling devices have a
- weight of 1, then they all contribute the same. You can
- use percentages if you want, but it's not mandatory. A
- weight of 0 means that this cooling device doesn't
- contribute to the cooling of this zone unless all cooling
- devices have a weight of 0. If all weights are 0, then
- they all contribute the same.
- .trip_mask:
- This is a bit mask that gives the binding relation between
- this thermal zone and cdev, for a particular trip point.
- If nth bit is set, then the cdev and thermal zone are bound
- for trip point n.
- .binding_limits:
- This is an array of cooling state limits. Must have
- exactly 2 * thermal_zone.number_of_trip_points. It is an
- array consisting of tuples <lower-state upper-state> of
- state limits. Each trip will be associated with one state
- limit tuple when binding. A NULL pointer means
- <THERMAL_NO_LIMITS THERMAL_NO_LIMITS> on all trips.
- These limits are used when binding a cdev to a trip point.
- .match:
- This call back returns success(0) if the 'tz and cdev' need to
- be bound, as per platform data.
-
- ::
-
- struct thermal_zone_params
-
- This structure defines the platform level parameters for a thermal zone.
- This data, for each thermal zone should come from the platform layer.
- This is an optional feature where some platforms can choose not to
- provide this data.
-
- .governor_name:
- Name of the thermal governor used for this zone
- .no_hwmon:
- a boolean to indicate if the thermal to hwmon sysfs interface
- is required. when no_hwmon == false, a hwmon sysfs interface
- will be created. when no_hwmon == true, nothing will be done.
- In case the thermal_zone_params is NULL, the hwmon interface
- will be created (for backward compatibility).
- .num_tbps:
- Number of thermal_bind_params entries for this zone
- .tbp:
- thermal_bind_params entries
-
-2. sysfs attributes structure
-=============================
-
-== ================
-RO read only value
-WO write only value
-RW read/write value
-== ================
-
-Thermal sysfs attributes will be represented under /sys/class/thermal.
-Hwmon sysfs I/F extension is also available under /sys/class/hwmon
-if hwmon is compiled in or built as a module.
-
-Thermal zone device sys I/F, created once it's registered::
-
- /sys/class/thermal/thermal_zone[0-*]:
- |---type: Type of the thermal zone
- |---temp: Current temperature
- |---mode: Working mode of the thermal zone
- |---policy: Thermal governor used for this zone
- |---available_policies: Available thermal governors for this zone
- |---trip_point_[0-*]_temp: Trip point temperature
- |---trip_point_[0-*]_type: Trip point type
- |---trip_point_[0-*]_hyst: Hysteresis value for this trip point
- |---emul_temp: Emulated temperature set node
- |---sustainable_power: Sustainable dissipatable power
- |---k_po: Proportional term during temperature overshoot
- |---k_pu: Proportional term during temperature undershoot
- |---k_i: PID's integral term in the power allocator gov
- |---k_d: PID's derivative term in the power allocator
- |---integral_cutoff: Offset above which errors are accumulated
- |---slope: Slope constant applied as linear extrapolation
- |---offset: Offset constant applied as linear extrapolation
-
-Thermal cooling device sys I/F, created once it's registered::
-
- /sys/class/thermal/cooling_device[0-*]:
- |---type: Type of the cooling device(processor/fan/...)
- |---max_state: Maximum cooling state of the cooling device
- |---cur_state: Current cooling state of the cooling device
- |---stats: Directory containing cooling device's statistics
- |---stats/reset: Writing any value resets the statistics
- |---stats/time_in_state_ms: Time (msec) spent in various cooling states
- |---stats/total_trans: Total number of times cooling state is changed
- |---stats/trans_table: Cooing state transition table
-
-
-Then next two dynamic attributes are created/removed in pairs. They represent
-the relationship between a thermal zone and its associated cooling device.
-They are created/removed for each successful execution of
-thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device.
-
-::
-
- /sys/class/thermal/thermal_zone[0-*]:
- |---cdev[0-*]: [0-*]th cooling device in current thermal zone
- |---cdev[0-*]_trip_point: Trip point that cdev[0-*] is associated with
- |---cdev[0-*]_weight: Influence of the cooling device in
- this thermal zone
-
-Besides the thermal zone device sysfs I/F and cooling device sysfs I/F,
-the generic thermal driver also creates a hwmon sysfs I/F for each _type_
-of thermal zone device. E.g. the generic thermal driver registers one hwmon
-class device and build the associated hwmon sysfs I/F for all the registered
-ACPI thermal zones.
-
-::
-
- /sys/class/hwmon/hwmon[0-*]:
- |---name: The type of the thermal zone devices
- |---temp[1-*]_input: The current temperature of thermal zone [1-*]
- |---temp[1-*]_critical: The critical trip point of thermal zone [1-*]
-
-Please read Documentation/hwmon/sysfs-interface.rst for additional information.
-
-Thermal zone attributes
------------------------
-
-type
- Strings which represent the thermal zone type.
- This is given by thermal zone driver as part of registration.
- E.g: "acpitz" indicates it's an ACPI thermal device.
- In order to keep it consistent with hwmon sys attribute; this should
- be a short, lowercase string, not containing spaces nor dashes.
- RO, Required
-
-temp
- Current temperature as reported by thermal zone (sensor).
- Unit: millidegree Celsius
- RO, Required
-
-mode
- One of the predefined values in [enabled, disabled].
- This file gives information about the algorithm that is currently
- managing the thermal zone. It can be either default kernel based
- algorithm or user space application.
-
- enabled
- enable Kernel Thermal management.
- disabled
- Preventing kernel thermal zone driver actions upon
- trip points so that user application can take full
- charge of the thermal management.
-
- RW, Optional
-
-policy
- One of the various thermal governors used for a particular zone.
-
- RW, Required
-
-available_policies
- Available thermal governors which can be used for a particular zone.
-
- RO, Required
-
-`trip_point_[0-*]_temp`
- The temperature above which trip point will be fired.
-
- Unit: millidegree Celsius
-
- RO, Optional
-
-`trip_point_[0-*]_type`
- Strings which indicate the type of the trip point.
-
- E.g. it can be one of critical, hot, passive, `active[0-*]` for ACPI
- thermal zone.
-
- RO, Optional
-
-`trip_point_[0-*]_hyst`
- The hysteresis value for a trip point, represented as an integer
- Unit: Celsius
- RW, Optional
-
-`cdev[0-*]`
- Sysfs link to the thermal cooling device node where the sys I/F
- for cooling device throttling control represents.
-
- RO, Optional
-
-`cdev[0-*]_trip_point`
- The trip point in this thermal zone which `cdev[0-*]` is associated
- with; -1 means the cooling device is not associated with any trip
- point.
-
- RO, Optional
-
-`cdev[0-*]_weight`
- The influence of `cdev[0-*]` in this thermal zone. This value
- is relative to the rest of cooling devices in the thermal
- zone. For example, if a cooling device has a weight double
- than that of other, it's twice as effective in cooling the
- thermal zone.
-
- RW, Optional
-
-passive
- Attribute is only present for zones in which the passive cooling
- policy is not supported by native thermal driver. Default is zero
- and can be set to a temperature (in millidegrees) to enable a
- passive trip point for the zone. Activation is done by polling with
- an interval of 1 second.
-
- Unit: millidegrees Celsius
-
- Valid values: 0 (disabled) or greater than 1000
-
- RW, Optional
-
-emul_temp
- Interface to set the emulated temperature method in thermal zone
- (sensor). After setting this temperature, the thermal zone may pass
- this temperature to platform emulation function if registered or
- cache it locally. This is useful in debugging different temperature
- threshold and its associated cooling action. This is write only node
- and writing 0 on this node should disable emulation.
- Unit: millidegree Celsius
-
- WO, Optional
-
- WARNING:
- Be careful while enabling this option on production systems,
- because userland can easily disable the thermal policy by simply
- flooding this sysfs node with low temperature values.
-
-sustainable_power
- An estimate of the sustained power that can be dissipated by
- the thermal zone. Used by the power allocator governor. For
- more information see Documentation/thermal/power_allocator.rst
-
- Unit: milliwatts
-
- RW, Optional
-
-k_po
- The proportional term of the power allocator governor's PID
- controller during temperature overshoot. Temperature overshoot
- is when the current temperature is above the "desired
- temperature" trip point. For more information see
- Documentation/thermal/power_allocator.rst
-
- RW, Optional
-
-k_pu
- The proportional term of the power allocator governor's PID
- controller during temperature undershoot. Temperature undershoot
- is when the current temperature is below the "desired
- temperature" trip point. For more information see
- Documentation/thermal/power_allocator.rst
-
- RW, Optional
-
-k_i
- The integral term of the power allocator governor's PID
- controller. This term allows the PID controller to compensate
- for long term drift. For more information see
- Documentation/thermal/power_allocator.rst
-
- RW, Optional
-
-k_d
- The derivative term of the power allocator governor's PID
- controller. For more information see
- Documentation/thermal/power_allocator.rst
-
- RW, Optional
-
-integral_cutoff
- Temperature offset from the desired temperature trip point
- above which the integral term of the power allocator
- governor's PID controller starts accumulating errors. For
- example, if integral_cutoff is 0, then the integral term only
- accumulates error when temperature is above the desired
- temperature trip point. For more information see
- Documentation/thermal/power_allocator.rst
-
- Unit: millidegree Celsius
-
- RW, Optional
-
-slope
- The slope constant used in a linear extrapolation model
- to determine a hotspot temperature based off the sensor's
- raw readings. It is up to the device driver to determine
- the usage of these values.
-
- RW, Optional
-
-offset
- The offset constant used in a linear extrapolation model
- to determine a hotspot temperature based off the sensor's
- raw readings. It is up to the device driver to determine
- the usage of these values.
-
- RW, Optional
-
-Cooling device attributes
--------------------------
-
-type
- String which represents the type of device, e.g:
-
- - for generic ACPI: should be "Fan", "Processor" or "LCD"
- - for memory controller device on intel_menlow platform:
- should be "Memory controller".
-
- RO, Required
-
-max_state
- The maximum permissible cooling state of this cooling device.
-
- RO, Required
-
-cur_state
- The current cooling state of this cooling device.
- The value can any integer numbers between 0 and max_state:
-
- - cur_state == 0 means no cooling
- - cur_state == max_state means the maximum cooling.
-
- RW, Required
-
-stats/reset
- Writing any value resets the cooling device's statistics.
- WO, Required
-
-stats/time_in_state_ms:
- The amount of time spent by the cooling device in various cooling
- states. The output will have "<state> <time>" pair in each line, which
- will mean this cooling device spent <time> msec of time at <state>.
- Output will have one line for each of the supported states. usertime
- units here is 10mS (similar to other time exported in /proc).
- RO, Required
-
-
-stats/total_trans:
- A single positive value showing the total number of times the state of a
- cooling device is changed.
-
- RO, Required
-
-stats/trans_table:
- This gives fine grained information about all the cooling state
- transitions. The cat output here is a two dimensional matrix, where an
- entry <i,j> (row i, column j) represents the number of transitions from
- State_i to State_j. If the transition table is bigger than PAGE_SIZE,
- reading this will return an -EFBIG error.
- RO, Required
-
-3. A simple implementation
-==========================
-
-ACPI thermal zone may support multiple trip points like critical, hot,
-passive, active. If an ACPI thermal zone supports critical, passive,
-active[0] and active[1] at the same time, it may register itself as a
-thermal_zone_device (thermal_zone1) with 4 trip points in all.
-It has one processor and one fan, which are both registered as
-thermal_cooling_device. Both are considered to have the same
-effectiveness in cooling the thermal zone.
-
-If the processor is listed in _PSL method, and the fan is listed in _AL0
-method, the sys I/F structure will be built like this::
-
- /sys/class/thermal:
- |thermal_zone1:
- |---type: acpitz
- |---temp: 37000
- |---mode: enabled
- |---policy: step_wise
- |---available_policies: step_wise fair_share
- |---trip_point_0_temp: 100000
- |---trip_point_0_type: critical
- |---trip_point_1_temp: 80000
- |---trip_point_1_type: passive
- |---trip_point_2_temp: 70000
- |---trip_point_2_type: active0
- |---trip_point_3_temp: 60000
- |---trip_point_3_type: active1
- |---cdev0: --->/sys/class/thermal/cooling_device0
- |---cdev0_trip_point: 1 /* cdev0 can be used for passive */
- |---cdev0_weight: 1024
- |---cdev1: --->/sys/class/thermal/cooling_device3
- |---cdev1_trip_point: 2 /* cdev1 can be used for active[0]*/
- |---cdev1_weight: 1024
-
- |cooling_device0:
- |---type: Processor
- |---max_state: 8
- |---cur_state: 0
-
- |cooling_device3:
- |---type: Fan
- |---max_state: 2
- |---cur_state: 0
-
- /sys/class/hwmon:
- |hwmon0:
- |---name: acpitz
- |---temp1_input: 37000
- |---temp1_crit: 100000
-
-4. Event Notification
-=====================
-
-The framework includes a simple notification mechanism, in the form of a
-netlink event. Netlink socket initialization is done during the _init_
-of the framework. Drivers which intend to use the notification mechanism
-just need to call thermal_generate_netlink_event() with two arguments viz
-(originator, event). The originator is a pointer to struct thermal_zone_device
-from where the event has been originated. An integer which represents the
-thermal zone device will be used in the message to identify the zone. The
-event will be one of:{THERMAL_AUX0, THERMAL_AUX1, THERMAL_CRITICAL,
-THERMAL_DEV_FAULT}. Notification can be sent when the current temperature
-crosses any of the configured thresholds.
-
-5. Export Symbol APIs
-=====================
-
-5.1. get_tz_trend
------------------
-
-This function returns the trend of a thermal zone, i.e the rate of change
-of temperature of the thermal zone. Ideally, the thermal sensor drivers
-are supposed to implement the callback. If they don't, the thermal
-framework calculated the trend by comparing the previous and the current
-temperature values.
-
-5.2. get_thermal_instance
--------------------------
-
-This function returns the thermal_instance corresponding to a given
-{thermal_zone, cooling_device, trip_point} combination. Returns NULL
-if such an instance does not exist.
-
-5.3. thermal_notify_framework
------------------------------
-
-This function handles the trip events from sensor drivers. It starts
-throttling the cooling devices according to the policy configured.
-For CRITICAL and HOT trip points, this notifies the respective drivers,
-and does actual throttling for other trip points i.e ACTIVE and PASSIVE.
-The throttling policy is based on the configured platform data; if no
-platform data is provided, this uses the step_wise throttling policy.
-
-5.4. thermal_cdev_update
-------------------------
-
-This function serves as an arbitrator to set the state of a cooling
-device. It sets the cooling device to the deepest cooling state if
-possible.
-
-6. thermal_emergency_poweroff
-=============================
-
-On an event of critical trip temperature crossing. Thermal framework
-allows the system to shutdown gracefully by calling orderly_poweroff().
-In the event of a failure of orderly_poweroff() to shut down the system
-we are in danger of keeping the system alive at undesirably high
-temperatures. To mitigate this high risk scenario we program a work
-queue to fire after a pre-determined number of seconds to start
-an emergency shutdown of the device using the kernel_power_off()
-function. In case kernel_power_off() fails then finally
-emergency_restart() is called in the worst case.
-
-The delay should be carefully profiled so as to give adequate time for
-orderly_poweroff(). In case of failure of an orderly_poweroff() the
-emergency poweroff kicks in after the delay has elapsed and shuts down
-the system.
-
-If set to 0 emergency poweroff will not be supported. So a carefully
-profiled non-zero positive value is a must for emergerncy poweroff to be
-triggered.
diff --git a/Documentation/thermal/x86_pkg_temperature_thermal.rst b/Documentation/thermal/x86_pkg_temperature_thermal.rst
deleted file mode 100644
index f134dbd3f5a9..000000000000
--- a/Documentation/thermal/x86_pkg_temperature_thermal.rst
+++ /dev/null
@@ -1,55 +0,0 @@
-===================================
-Kernel driver: x86_pkg_temp_thermal
-===================================
-
-Supported chips:
-
-* x86: with package level thermal management
-
-(Verify using: CPUID.06H:EAX[bit 6] =1)
-
-Authors: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
-
-Reference
----------
-
-Intel® 64 and IA-32 Architectures Software Developer’s Manual (Jan, 2013):
-Chapter 14.6: PACKAGE LEVEL THERMAL MANAGEMENT
-
-Description
------------
-
-This driver register CPU digital temperature package level sensor as a thermal
-zone with maximum two user mode configurable trip points. Number of trip points
-depends on the capability of the package. Once the trip point is violated,
-user mode can receive notification via thermal notification mechanism and can
-take any action to control temperature.
-
-
-Threshold management
---------------------
-Each package will register as a thermal zone under /sys/class/thermal.
-
-Example::
-
- /sys/class/thermal/thermal_zone1
-
-This contains two trip points:
-
-- trip_point_0_temp
-- trip_point_1_temp
-
-User can set any temperature between 0 to TJ-Max temperature. Temperature units
-are in milli-degree Celsius. Refer to "Documentation/thermal/sysfs-api.rst" for
-thermal sys-fs details.
-
-Any value other than 0 in these trip points, can trigger thermal notifications.
-Setting 0, stops sending thermal notifications.
-
-Thermal notifications:
-To get kobject-uevent notifications, set the thermal zone
-policy to "user_space".
-
-For example::
-
- echo -n "user_space" > policy