summaryrefslogtreecommitdiffstats
path: root/drivers/thermal
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'thermal-6.2-rc1-2' of ↵Linus Torvalds2022-12-1517-79/+312
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more thermal control updates from Rafael Wysocki: "These are updates of assorted thermal drivers, mostly for ARM platforms, generally isolated and fairly straightforward, and the recent Intel HFI driver fix for systems without HFI support. Specifics: - Avoid clearing the HFI status bit on systems without HFI support which triggers unchecked MSR access errors (Srinivas Pandruvada) - Add sm8450 and sm8550 QCom compatible string to DT bindings (Luca Weiss, Neil Armstrong) - Use devm_platform_get_and_ioremap_resource on the ST platform to group two calls into a single one (Minghao Chi) - Use GENMASK instead of bitmaps and validate the temperature after reading it in the imx8mm_thermal driver (Marcus Folkesson) - Convert generic-adc-thermal to DT schema (Rob Herring) - Fix debug print message with inverted logic in the k3_j72xx_bandgap driver (Keerthy) - Fix memory leak on thermal_of_zone_register() failure (Ido Schimmel) - Add support for IPQ8074 in the tsens thermal driver along with the DT bindings (Robert Marko) - Fix and rework the debugfs code in the tsens driver (Christian Marangi) - Add calibration and DT documentation for the imx8mm driver (Marek Vasut) - Add DT bindings and compatible for the Mediatek SoCs mt7981 and mt7983 (Daniel Golle) - Don't show an error message if it happens at probe time while it will be deferred on the QCom SPMI ADC driver (Johan Hovold) - Add HWMon support for the imx8mm board (Alexander Stein) - Remove pointless include from the power allocator governor (Christophe JAILLET) - Add interrupt DT bindings for QCom SoCs SC8280XP, SM6350 and SM8450 (Krzysztof Kozlowski) - Fix inaccurate warning message for the QCom tsens gen2 (Luca Weiss) - Demote error log of thermal zone register to debug in the tsens QCom driver (Manivannan Sadhasivam) - Consolidate the the efuse values and the errata handling in the TI Bandgap driver (Bryan Brattlof) - Document Renesas RZ/Five as compatible with RZ/G2UL in the DT bindings (Lad Prabhakar) - Fix the irq handler return value in the LMh driver (Bjorn Andersson) - Delete empty platform remove callback from imx_sc_thermal (Uwe Kleine-König)" * tag 'thermal-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) thermal/drivers/imx_sc_thermal: Drop empty platform remove function thermal/drivers/qcom/lmh: Fix irq handler return value dt-bindings: thermal: qcom-tsens: Add compatible for sm8550 thermal/drivers/st: Use devm_platform_get_and_ioremap_resource() dt-bindings: thermal: rzg2l-thermal: Document RZ/Five SoC dt-bindings: thermal: k3-j72xx: conditionally require efuse reg range dt-bindings: thermal: k3-j72xx: elaborate on binding description thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaround thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structure thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flag thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function thermal/drivers/qcom: Demote error log of thermal zone register to debug thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2 dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450 thermal/core/power allocator: Remove a useless include thermal/drivers/imx8mm: Add hwmon support thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC thermal: ti-soc-thermal: Drop comma after SoC match table sentinel thermal/drivers/imx: Add support for loading calibration data from OCOTP ...
| * Merge tag 'thermal-v6.2-rc1' of ↵Rafael J. Wysocki2022-12-1416-78/+308
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal driver changes for 6.2-rc1 from Daniel Lezcano: "- Add the sm8450 QCom compatible string in the DT bindings (Luca Weiss) - Use devm_platform_get_and_ioremap_resource on the ST platform to group two calls into a single one (Minghao Chi) - Add the sm8550 QCom compatible string in the DT bindings (Neil Armstrong) - Use GENMASK instead of bitmaps and validate the temperature after reading it (Marcus Folkesson) - Convert generic-adc-thermal to DT schema (Rob Herring) - Fix the debug print message where the logic is inverted (Keerthy) - Fix memory leak on thermal_of_zone_register() failure (Ido Schimmel) - Add support for IPQ8074 in the tsens driver along with the DT bindings (Robert Marko) - Fix and rework the debugfs code in the tsens driver (Christian Marangi) - Add calibration and DT documentation for the imx8mm driver (Marek Vasut) - Add DT bindings and compatible for the Mediatek SoCs mt7981 and mt7983 (Daniel Golle) - Don't show an error message if it happens at probe time while it will be deferred on the QCom SPMI ADC driver (Johan Hovold) - Add the HWMon support on the imx8mm board (Alexander Stein) - Remove a pointless include in the power allocator governor (Christophe JAILLET) - Add interrupt DT bindings for QCom SoCs SC8280XP, SM6350 and SM8450 (Krzysztof Kozlowski) - Fix inaccurate warning message for the QCom tsens gen2 (Luca Weiss) - Demote error log of thermal zone register to debug on the tsens QCom driver (Manivannan Sadhasivam) - Consolidate the TI Bandgap driver regarding how is handled the efuse values and the errata handling (Bryan Brattlof) - Document the Renesas RZ/Five as compatible with RZ/G2UL in the DT bindings (Lad Prabhakar) - Fix the irq handler return value in the LMh driver (Bjorn Andersson) - Delete platform remove callback as it is empty (Uwe Kleine-König)" * tag 'thermal-v6.2-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (34 commits) thermal/drivers/imx_sc_thermal: Drop empty platform remove function thermal/drivers/qcom/lmh: Fix irq handler return value dt-bindings: thermal: qcom-tsens: Add compatible for sm8550 thermal/drivers/st: Use devm_platform_get_and_ioremap_resource() dt-bindings: thermal: rzg2l-thermal: Document RZ/Five SoC dt-bindings: thermal: k3-j72xx: conditionally require efuse reg range dt-bindings: thermal: k3-j72xx: elaborate on binding description thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaround thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structure thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flag thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function thermal/drivers/qcom: Demote error log of thermal zone register to debug thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2 dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450 thermal/core/power allocator: Remove a useless include thermal/drivers/imx8mm: Add hwmon support thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC thermal: ti-soc-thermal: Drop comma after SoC match table sentinel thermal/drivers/imx: Add support for loading calibration data from OCOTP ...
| | * thermal/drivers/imx_sc_thermal: Drop empty platform remove functionUwe Kleine-König2022-12-141-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | A remove callback just returning 0 is equivalent to no remove callback at all. So drop the useless function. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20221212220217.3777176-1-u.kleine-koenig@pengutronix.de Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/qcom/lmh: Fix irq handler return valueBjorn Andersson2022-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After enough invocations the LMh irq is eventually reported as bad, because the handler doesn't return IRQ_HANDLED, fix this. Fixes: 53bca371cdf7 ("thermal/drivers/qcom: Add support for LMh driver") Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220316180322.88132-1-bjorn.andersson@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/st: Use devm_platform_get_and_ioremap_resource()Minghao Chi2022-12-141-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert platform_get_resource(), devm_ioremap_resource() to a single call to devm_platform_get_and_ioremap_resource(), as this is exactly what this function does. Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202211171409524332954@zte.com.cn Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaroundBryan Brattlof2022-12-141-12/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of TI's J721E SoCs require a software trimming procedure for the temperature monitors to function properly. To determine if a particular J721E is not affected by this erratum, both bits in the WKUP_SPARE_FUSE0 region must be set. Other SoCs, not affected by this erratum, will not need this region. Map the 'fuse_base' region only when the erratum fix is needed. Signed-off-by: Bryan Brattlof <bb@ti.com> Link: https://lore.kernel.org/r/20221031232702.10339-5-bb@ti.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structureBryan Brattlof2022-12-141-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'fuse_base' is only needed during the initial probe function to provide data for a software trimming method for some of TI's devices affected by the i2128 erratum. The devices not affected will not use this region Remove fuse_base from the main k3_j72xx_bandgap structure Signed-off-by: Bryan Brattlof <bb@ti.com> Link: https://lore.kernel.org/r/20221031232702.10339-4-bb@ti.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flagBryan Brattlof2022-12-141-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of TI's J721E SoCs require a software trimming method to report temperatures accurately. Currently we are using a few different data types to indicate when we should apply the erratum. Change the 'workaround_needed' variable's data type to a bool to align with how we are using this variable currently. Signed-off-by: Bryan Brattlof <bb@ti.com> Link: https://lore.kernel.org/r/20221031232702.10339-3-bb@ti.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() functionBryan Brattlof2022-12-141-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The k3_thermal_get_temp() function can be simplified to return only the result of k3_bgp_read_temp() without needing the 'ret' variable Signed-off-by: Bryan Brattlof <bb@ti.com> Link: https://lore.kernel.org/r/20221031232702.10339-2-bb@ti.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/qcom: Demote error log of thermal zone register to debugManivannan Sadhasivam2022-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | devm_thermal_of_zone_register() can fail with -ENODEV if thermal zone for the channel is not represented in DT. This is perfectly fine since not all sensors needs to be used for thermal zones but only a few in real world. So demote the error log to debug to avoid spamming users. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20221029052933.32421-1-manivannan.sadhasivam@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2Luca Weiss2022-12-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On gen2 chips the stage2 threshold is not 140 degC but 125 degC. Make the warning message clearer by using this variable and also by including the temperature that was checked for. Fixes: aa92b3310c55 ("thermal/drivers/qcom-spmi-temp-alarm: Add support for GEN2 rev 1 PMIC peripherals") Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Reviewed-by: Amit Kucheria <amitk@kernel.org> Link: https://lore.kernel.org/r/20221020145237.942146-1-luca.weiss@fairphone.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/core/power allocator: Remove a useless includeChristophe JAILLET2022-12-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file does not use rcu, so there is no point in including <linux/rculist.h>. Remove it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/9adeec47cb5a8193016272d5c8bf936235c1711d.1669459337.git.christophe.jaillet@wanadoo.fr Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/imx8mm: Add hwmon supportAlexander Stein2022-12-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Expose thermal sensors as HWMON devices. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Link: https://lore.kernel.org/r/20220726122331.323093-1-alexander.stein@ew.tq-group.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal: qcom-spmi-adc-tm5: suppress probe-deferral error messageJohan Hovold2022-12-141-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers should not be logging errors on probe deferral. Switch to using dev_err_probe() to log failures when parsing the devicetree to avoid errors like: qcom-spmi-adc-tm5 c440000.spmi:pmic@0:adc-tm@3400: get dt data failed: -517 when a channel is not yet available. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Link: https://lore.kernel.org/r/20221102152630.696-1-johan+linaro@kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal: ti-soc-thermal: Drop comma after SoC match table sentinelGeert Uytterhoeven2022-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It does not make sense to have a comma after a sentinel, as any new elements must be added before the sentinel. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Keerthy <j-keerthy@ti.com> Link: https://lore.kernel.org/r/1d6de2a80b919cb11199e56ac06ad21c273ebe57.1669045586.git.geert+renesas@glider.be Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/imx: Add support for loading calibration data from OCOTPMarek Vasut2022-12-141-0/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TMU TASR, TCALIVn, TRIM registers must be explicitly programmed with calibration values in OCOTP. Add support for reading the OCOTP calibration data and programming those into the TMU hardware. The MX8MM/MX8MN TMUv1 uses only one OCOTP cell, while MX8MP TMUv2 uses 4, the programming differs in each case. Based on U-Boot commits: 70487ff386c ("imx8mm: Load fuse for TMU TCALIV and TASR") ebb9aab318b ("imx: load calibration parameters from fuse for i.MX8MP") Reviewed-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/qcom/tsens: Rework debugfs file structureChristian Marangi2022-12-141-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current tsens debugfs structure is composed by: - a tsens dir in debugfs with a version file - a directory for each tsens istance with sensors file to dump all the sensors value. This works on the assumption that we have the same version for each istance but this assumption seems fragile and with more than one tsens istance results in the version file not tracking each of them. A better approach is to just create a subdirectory for each tsens istance and put there version and sensors debugfs file. Using this new implementation results in less code since debugfs entry are created only on successful tsens probe. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20221022125657.22530-4-ansuelsmth@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/qcom/tsens: Fix wrong version id dbg_version_showChristian Marangi2022-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For VER_0 the version was incorrectly reported as 0.1.0. Fix that and correctly report the major version for this old tsens revision. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20221022125657.22530-3-ansuelsmth@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/qcom/tsens: Init debugfs only with successful probeChristian Marangi2022-12-141-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calibrate and tsens_register can fail or PROBE_DEFER. This will cause a double or a wrong init of the debugfs information. Init debugfs only with successful probe fixing warning about directory already present. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Thara Gopinath <thara.gopinath@linaro.org> Link: https://lore.kernel.org/r/20221022125657.22530-2-ansuelsmth@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
| | * thermal/drivers/tsens: Add IPQ8074 supportRobert Marko2022-12-143-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qualcomm IPQ8074 uses tsens v2.3 IP, however unlike other tsens v2 IP it only has one IRQ, that is used for up/low as well as critical. It also does not support negative trip temperatures. Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220818220245.338396-4-robimarko@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| | * thermal/drivers/tsens: Allow configuring min and max tripsRobert Marko2022-12-146-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPQ8074 and IPQ6018 dont support negative trip temperatures and support up to 204 degrees C as the max trip temperature. So, instead of always setting the -40 as min and 120 degrees C as max allow it to be configured as part of the features. Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220818220245.338396-3-robimarko@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| | * thermal/drivers/tsens: Add support for combined interruptRobert Marko2022-12-146-6/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Despite using tsens v2.3 IP, IPQ8074 and IPQ6018 only have one IRQ for signaling both up/low and critical trips. Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20220818220245.338396-2-robimarko@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| | * thermal/of: Fix memory leak on thermal_of_zone_register() failureIdo Schimmel2022-12-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function does not free 'of_ops' upon failure, leading to a memory leak [1]. Fix by freeing 'of_ops' in the error path. [1] unreferenced object 0xffff8ee846198c80 (size 128): comm "swapper/0", pid 1, jiffies 4294699704 (age 70.076s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ d0 3f 6e 8c ff ff ff ff 00 00 00 00 00 00 00 00 .?n............. backtrace: [<00000000d136f562>] __kmalloc_node_track_caller+0x42/0x120 [<0000000063f31678>] kmemdup+0x1d/0x40 [<00000000e6d24096>] thermal_of_zone_register+0x49/0x520 [<000000005e78c755>] devm_thermal_of_zone_register+0x54/0x90 [<00000000ee6b209e>] pmbus_add_sensor+0x1b4/0x1d0 [<00000000896105e3>] pmbus_add_sensor_attrs_one+0x123/0x440 [<0000000049e990a6>] pmbus_add_sensor_attrs+0xfe/0x1d0 [<00000000466b5440>] pmbus_do_probe+0x66b/0x14e0 [<0000000084d42285>] i2c_device_probe+0x13b/0x2f0 [<0000000029e2ae74>] really_probe+0xce/0x2c0 [<00000000692df15c>] driver_probe_device+0x19/0xd0 [<00000000547d9cce>] __device_attach_driver+0x6f/0x100 [<0000000020abd24b>] bus_for_each_drv+0x76/0xc0 [<00000000665d9563>] __device_attach+0xfc/0x180 [<000000008ddd4d6a>] bus_probe_device+0x82/0xa0 [<000000009e61132b>] device_add+0x3fe/0x920 Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20221020103658.802457-1-idosch@nvidia.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| | * thermal/drivers/k3_j72xx_bandgap: Fix the debug print messageKeerthy2022-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The debug print message to check the workaround applicability is inverted. Fix the same. Fixes: ffcb2fc86eb7 ("thermal: k3_j72xx_bandgap: Add the bandgap driver support") Reported-by: Bryan Brattlof <bb@ti.com> Signed-off-by: Keerthy <j-keerthy@ti.com> Link: https://lore.kernel.org/r/20221010034126.3550-1-j-keerthy@ti.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| | * thermal/drivers/imx8mm_thermal: Validate temperature rangeMarcus Folkesson2022-12-141-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check against the upper temperature limit (125 degrees C) before consider the temperature valid. Fixes: 5eed800a6811 ("thermal: imx8mm: Add support for i.MX8MM thermal monitoring unit") Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com> Reviewed-by: Jacky Bai <ping.bai@nxp.com> Link: https://lore.kernel.org/r/20221014073507.1594844-1-marcus.folkesson@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| | * thermal/drivers/imx8mm_thermal: Use GENMASK() when appropriateMarcus Folkesson2022-12-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | GENMASK() is preferred to use for bitmasks. Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com> Link: https://lore.kernel.org/r/20221014081620.1599511-1-marcus.folkesson@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| * | thermal: intel: Don't set HFI status bit to 1Srinivas Pandruvada2022-12-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CPU doesn't support HFI (Hardware Feedback Interface), don't include BIT 26 in the mask to prevent clearing. otherwise this results in: unchecked MSR access error: WRMSR to 0x1b1 (tried to write 0x0000000004000aa8) at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0) Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | Merge tag 'thermal-6.2-rc1' of ↵Linus Torvalds2022-12-1211-142/+255
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These include thermal core fixes to protect thermal device operations against thermal device removal, other thermal core fixes and updates of Intel thermal control drivers. Specifics: - Fix race conditions related to thermal device operations that are not protected against thermal device removal (Guenter Roeck) - Fix error code in __thermal_cooling_device_register() (Dan Carpenter) - Validate new cooling device state (coming from user space) in cur_state_store() and reuse the max_state value from cooling device structure in the sysfs interface (Viresh Kumar) - Fix some possible name leaks in error paths in the thermal control core code (Yang Yingliang) - Detect TCC lock bit set in the intel_tcc_cooling driver and make it refuse to update the TCC offset in that case (Zhang Rui) - Add TCC cooling support for RaptorLake-S (Zhang Rui) - Prevent accidental clearing of HFI status by one of the other drivers using the same status register (Srinivas Pandruvada) - Protect clearing of thermal status bits in Intel thermal control drivers (Srinivas Pandruvada) - Allow the HFI thermal control driver to ACK an HFI event for the previously observed timestamp (Srinivas Pandruvada) - Remove a pointless die_id check from the HFI thermal driver and adjust the definition a data structure used by it (Ricardo Neri)" * tag 'thermal-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: hfi: Remove a pointless die_id check thermal: core: fix some possible name leaks in error paths thermal: intel: hfi: ACK HFI for the same timestamp thermal: intel: Protect clearing of thermal status bits thermal: intel: Prevent accidental clearing of HFI status thermal/core: Protect thermal device operations against thermal device removal thermal/core: Remove thermal_zone_set_trips() thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex thermal/core: Introduce locked version of thermal_zone_device_update thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp thermal/core: Delete device under thermal device zone lock thermal/core: Destroy thermal zone device mutex in release function thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S thermal: intel: intel_tcc_cooling: Detect TCC lock bit thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages thermal/core: fix error code in __thermal_cooling_device_register() thermal: sysfs: Reuse cdev->max_state thermal: Validate new state in cur_state_store()
| * | Merge branch 'thermal-intel'Rafael J. Wysocki2022-12-121-1/+1
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | Merge one more Intel thermal control change for 6.2-rc1: - Remove a pointless die_id chec from the Intel HFI thermal control driver (Ricardo Neri). * thermal-intel: thermal: intel: hfi: Remove a pointless die_id check
| | * thermal: intel: hfi: Remove a pointless die_id checkRicardo Neri2022-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | die_id is an u16 quantity. On single-die systems the default value of die_id is 0. No need to check for negative values. Plus, removing this check makes Coverity happy. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | Merge Intel thermal control drivers changes for v6.2Rafael J. Wysocki2022-12-025-31/+52
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add Raptor Lake-S support to the intel_tcc_cooling driver (Zhang Rui). - Make the intel_tcc_cooling driver detect TCC locking (Zhang Rui). - Address Coverity warning in intel_hfi_process_event() (Ricardo Neri). - Prevent accidental clearing of HFI in the package thermal interrupt status (Srinivas Pandruvada). - Protect the clearing of status bits in MSR_IA32_PACKAGE_THERM_STATUS and MSR_IA32_THERM_STATUS (Srinivas Pandruvada). - Allow the HFI interrupt handler to ACK an event for the same timestamp (Srinivas Pandruvada). * thermal-intel: thermal: intel: hfi: ACK HFI for the same timestamp thermal: intel: Protect clearing of thermal status bits thermal: intel: Prevent accidental clearing of HFI status thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S thermal: intel: intel_tcc_cooling: Detect TCC lock bit thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages
| | * thermal: intel: hfi: ACK HFI for the same timestampSrinivas Pandruvada2022-11-231-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some processors issue more than one HFI interrupt with the same timestamp. Each interrupt must be acknowledged to let the hardware issue new HFI interrupts. But this can't be done without some additional flow modification in the existing interrupt handling. For background, the HFI interrupt is a package level thermal interrupt delivered via a LVT. This LVT is common for both the CPU and package level interrupts. Hence, all CPUs receive the HFI interrupts. But only one CPU should process interrupt and others simply exit by issuing EOI to LAPIC. The current HFI interrupt processing flow: 1. Receive Thermal interrupt 2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS 3. Try and get spinlock, one CPU will enter spinlock and others will simply return from here to issue EOI. (Let's assume CPU 4 is processing interrupt) 4. Check the stored time-stamp from the HFI memory time-stamp 5. if same 6. ignore interrupt, unlock and return 7. Copy the HFI message to local buffer 8. unlock spinlock 9. ACK HFI interrupt 10. Queue the message for processing in a work-queue It is tempting to simply acknowledge all the interrupts even if they have the same timestamp. This may cause some interrupts to not be processed. Let's say CPU5 is slightly late and reaches step 4 while CPU4 is between steps 8 and 9. Currently we simply ignore interrupts with the same timestamp. No issue here for CPU5. When CPU4 acknowledges the interrupt, the next HFI interrupt can be delivered. If we acknowledge interrupts with the same timestamp (at step 6), there is a race condition. Under the same scenario, CPU 5 will acknowledge the HFI interrupt. This lets hardware generate another HFI interrupt, before CPU 4 start executing step 9. Once CPU 4 complete step 9, it will acknowledge the newly arrived HFI interrupt, without actually processing it. Acknowledge the interrupt when holding the spinlock. This avoids contention of the interrupt acknowledgment. Updated flow: 1. Receive HFI Thermal interrupt 2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS 3. Try and get spin-lock Let's assume CPU 4 is processing interrupt 4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit 4.2 If hfi status is 0 4.3 unlock spinlock 4.4 return 4.5 Check the stored time-stamp from the HFI memory time-stamp 5. if same 6.1 ACK HFI Interrupt, 6.2 unlock spinlock 6.3 return 7. Copy the HFI message to local buffer 8. ACK HFI interrupt 9. unlock spinlock 10. Queue the message for processing in a work-queue To avoid taking the lock unnecessarily, intel_hfi_process_event() checks the status of the HFI interrupt before taking the lock. If CPU5 is late, when it starts processing the interrupt there are two scenarios: a) CPU4 acknowledged the HFI interrupt before CPU5 read MSR_IA32_THERM_STATUS. CPU5 exits. b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the interrupt. CPU5 will take the lock if CPU4 has released it. It then re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt, the HFI status bit is clear and CPU5 exits. If a new HFI interrupt was generated it will find that the status bit is set and it will continue to process the interrupt. In this case even if timestamp is not changed, the ACK can be issued as this is a new interrupt. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Tested-by: Arshad, Adeel<adeel.arshad@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * thermal: intel: Protect clearing of thermal status bitsSrinivas Pandruvada2022-11-234-24/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The clearing of the package thermal status is done by Read-Modify-Write operation. This may result in clearing of some new status bits which are being or about to be processed. For example, while clearing of HFI status, after read of thermal status register, a new thermal status bit is set by the hardware. But during write back, the newly generated status bit will be set to 0 or cleared. So, it is not safe to do read-modify-write. Since thermal status Read-Write bits can be set to only 0 not 1, it is safe to set all other bits to 1 which are not getting cleared. Create a common interface for clearing package thermal status bits. Use this interface to replace existing code to clear thermal package status bits. It is safe to call from different CPUs without protection as there is no read-modify-write. Also wrmsrl results in just single instruction. For example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write 0x4000aa8. The bits which are not part of clear are set to 1. The default mask for bits, which can be written here is 0x4000aaa. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * thermal: intel: Prevent accidental clearing of HFI statusSrinivas Pandruvada2022-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there is a package thermal interrupt with PROCHOT log, it will be processed and cleared. It is possible that there is an active HFI event status, which is about to get processed or getting processed. While clearing PROCHOT log bit, it will also clear HFI status bit. This means that hardware is free to update HFI memory. When clearing a package thermal interrupt, some processors will generate a "general protection fault" when any of the read only bit is set to 1. The driver maintains a mask of all read-write bits which can be set. This mask doesn't include HFI status bit. This bit will also be cleared, as it will be assumed read-only bit. So, add HFI status bit 26 to the mask. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-SZhang Rui2022-11-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Add RaptorLake to the list of processor models supported by the Intel TCC cooling driver. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * thermal: intel: intel_tcc_cooling: Detect TCC lock bitZhang Rui2022-11-091-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When MSR_IA32_TEMPERATURE_TARGET is locked, TCC Offset can not be updated even if the PROGRAMMABE Bit is set. Yield the driver on platforms with MSR_IA32_TEMPERATURE_TARGET locked. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * thermal: intel: hfi: Improve the type of hfi_features::nr_table_pagesRicardo Neri2022-10-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A Coverity static code scan raised a potential overflow_before_widen warning when hfi_features::nr_table_pages is used as an argument to memcpy in intel_hfi_process_event(). Even though the overflow can never happen (the maximum number of pages of the HFI table is 0x10 and 0x10 << PAGE_SHIFT = 0x10000), using size_t as the data type of hfi_features::nr_table_pages makes Coverity happy and matches the data type of the argument 'size' of memcpy(). Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal: core: fix some possible name leaks in error pathsYang Yingliang2022-11-251-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some error paths before device_register(), the names allocated by dev_set_name() are not freed. Move dev_set_name() front to device_register(), so the name can be freed while calling put_device(). Fixes: 1dd7128b839f ("thermal/core: Fix null pointer dereference in thermal_release()") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Protect thermal device operations against thermal device removalGuenter Roeck2022-11-141-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thermal device operations may be called after thermal zone device removal. After thermal zone device removal, thermal zone device operations must no longer be called. To prevent such calls from happening, ensure that the thermal device is registered before executing any thermal device operations. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Remove thermal_zone_set_trips()Guenter Roeck2022-11-142-20/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since no callers of thermal_zone_set_trips() are left, remove the function. Document __thermal_zone_set_trips() instead. Explicitly state that the thermal zone lock must be held when calling the function, and that the pointer to the thermal zone must be valid. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Protect sysfs accesses to thermal operations with thermal zone ↵Guenter Roeck2022-11-143-16/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mutex Protect access to thermal operations against thermal zone removal by acquiring the thermal zone device mutex. After acquiring the mutex, check if the thermal zone device is registered and abort the operation if not. With this change, we can call __thermal_zone_device_update() instead of thermal_zone_device_update() from trip_point_temp_store() and from emul_temp_store(). Similar, we can call __thermal_zone_set_trips() instead of thermal_zone_set_trips() from trip_point_hyst_store(). Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Protect hwmon accesses to thermal operations with thermal zone ↵Guenter Roeck2022-11-141-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mutex In preparation to protecting access to thermal operations against thermal zone device removal, protect hwmon accesses to thermal zone operations with the thermal zone mutex. After acquiring the mutex, ensure that the thermal zone device is registered before proceeding. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Introduce locked version of thermal_zone_device_updateGuenter Roeck2022-11-141-26/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In thermal_zone_device_set_mode(), the thermal zone mutex is released only to be reacquired in the subsequent call to thermal_zone_device_update(). Introduce __thermal_zone_device_update(), which is similar to thermal_zone_device_update() but has to be called with the thermal device mutex held. Call the new function from thermal_zone_device_set_mode() to avoid the extra thermal device mutex release/acquire sequence in that function. With the new function in place, re-implement thermal_zone_device_update() as wrapper around __thermal_zone_device_update() to acquire and release the thermal device mutex. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Move parameter validation from __thermal_zone_get_temp to ↵Guenter Roeck2022-11-141-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | thermal_zone_get_temp All callers of __thermal_zone_get_temp() already validated the thermal zone parameters. Move validation to thermal_zone_get_temp() where it is actually needed. Also add kernel documentation for __thermal_zone_get_temp(), listing the requirement that the function must be called with validated parameters and with thermal device mutex held. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Ensure that thermal device is registered in thermal_zone_get_tempGuenter Roeck2022-11-141-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calls to thermal_zone_get_temp() are not protected against thermal zone device removal. As result, it is possible that the thermal zone operations callbacks are no longer valid when thermal_zone_get_temp() is called. This may result in crashes such as BUG: unable to handle page fault for address: ffffffffc04ef420 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 5d60e067 P4D 5d60e067 PUD 5d610067 PMD 110197067 PTE 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 3209 Comm: cat Tainted: G W 5.10.136-19389-g615abc6eb807 #1 02df41ac0b12f3a64f4b34245188d8875bb3bce1 Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.92.0 11/27/2018 RIP: 0010:thermal_zone_get_temp+0x26/0x73 Code: 89 c3 eb d3 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 85 ff 74 50 48 89 fb 48 81 ff 00 f0 ff ff 77 44 48 8b 83 98 03 00 00 <48> 83 78 10 00 74 36 49 89 f6 4c 8d bb d8 03 00 00 4c 89 ff e8 9f RSP: 0018:ffffb3758138fd38 EFLAGS: 00010287 RAX: ffffffffc04ef410 RBX: ffff98f14d7fb000 RCX: 0000000000000000 RDX: ffff98f17cf90000 RSI: ffffb3758138fd64 RDI: ffff98f14d7fb000 RBP: ffffb3758138fd50 R08: 0000000000001000 R09: ffff98f17cf90000 R10: 0000000000000000 R11: ffffffff8dacad28 R12: 0000000000001000 R13: ffff98f1793a7d80 R14: ffff98f143231708 R15: ffff98f14d7fb018 FS: 00007ec166097800(0000) GS:ffff98f1bbd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc04ef420 CR3: 000000010ee9a000 CR4: 00000000003506e0 Call Trace: temp_show+0x31/0x68 dev_attr_show+0x1d/0x4f sysfs_kf_seq_show+0x92/0x107 seq_read_iter+0xf5/0x3f2 vfs_read+0x205/0x379 __x64_sys_read+0x7c/0xe2 do_syscall_64+0x43/0x55 entry_SYSCALL_64_after_hwframe+0x61/0xc6 if a thermal device is removed while accesses to its device attributes are ongoing. The problem is exposed by code in iwl_op_mode_mvm_start(), which registers a thermal zone device only to unregister it shortly afterwards if an unrelated failure is encountered while accessing the hardware. Check if the thermal zone device is registered after acquiring the thermal zone device mutex to ensure this does not happen. The code was tested by triggering the failure in iwl_op_mode_mvm_start() on purpose. Without this patch, the kernel crashes reliably. The crash is no longer observed after applying this and the preceding patches. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Delete device under thermal device zone lockGuenter Roeck2022-11-141-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thermal device attributes may still be opened after unregistering the thermal zone and deleting the thermal device. Currently there is no protection against accessing thermal device operations after unregistering a thermal zone. To enable adding such protection, protect the device delete operation with the thermal zone device mutex. This requires splitting the call to device_unregister() into its components, device_del() and put_device(). Only the first call can be executed under mutex protection, since put_device() may result in releasing the thermal zone device memory. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: Destroy thermal zone device mutex in release functionGuenter Roeck2022-11-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accesses to thermal zones, and with it the thermal zone device mutex, are still possible after the thermal zone device has been unregistered. For example, thermal_zone_get_temp() can be called from temp_show() in thermal_sysfs.c if the sysfs attribute was opened before the thermal device was unregistered. Move the call to mutex_destroy from thermal_zone_device_unregister() to thermal_release() to ensure that it is only destroyed after it is guaranteed to be no longer accessed. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal/core: fix error code in __thermal_cooling_device_register()Dan Carpenter2022-10-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Return an error pointer if ->get_max_state() fails. The current code returns NULL which will cause an oops in the callers. Fixes: c408b3d1d9bb ("thermal: Validate new state in cur_state_store()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal: sysfs: Reuse cdev->max_stateViresh Kumar2022-10-251-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | Now that the cooling device structure stores the max_state value, reuse it and drop max_states from struct cooling_dev_stats. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | thermal: Validate new state in cur_state_store()Viresh Kumar2022-10-253-19/+13
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cur_state_store(), the new state of the cooling device is received from user-space and is not validated by the thermal core but the same is left for the individual drivers to take care of. Apart from duplicating the code it leaves possibility for introducing bugs where a driver may not do it right. Lets make the thermal core check the new state itself and store the max value in the cooling device structure. Link: https://lore.kernel.org/all/Y0ltRJRjO7AkawvE@kili/ Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>