summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/host/hwmon.c
Commit message (Collapse)AuthorAgeFilesLines
* nvme: host: hwmon: constify pointers to hwmon_channel_infoKrzysztof Kozlowski2023-08-211-1/+1
| | | | | | | | | | Statically allocated array of pointed to hwmon_channel_info can be made const for safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Keith Busch <kbusch@kernel.org>
* nvme-pci: add quirk for missing secondary temperature thresholdsHristo Venev2023-05-031-1/+3
| | | | | | | | | | | | | | | | | | | On Kingston KC3000 and Kingston FURY Renegade (both have the same PCI IDs) accessing temp3_{min,max} fails with an invalid field error (note that there is no problem setting the thresholds for temp1). This contradicts the NVM Express Base Specification 2.0b, page 292: The over temperature threshold and under temperature threshold features shall be implemented for all implemented temperature sensors (i.e., all Temperature Sensor fields that report a non-zero value). Define NVME_QUIRK_NO_SECONDARY_TEMP_THRESH that disables the thresholds for all but the composite temperature and set it for this device. Signed-off-by: Hristo Venev <hristo@venev.name> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-hwmon: kmalloc the NVME SMART log bufferSerge Semin2022-10-191-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has caused a regression on our platform. It turned out that the nvme_get_log() method invocation caused the nvme_hwmon_data structure instance corruption. In particular the nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with garbage. After some research we discovered that the problem happened even before the actual NVME DMA execution, but during the buffer mapping. Since our platform is DMA-noncoherent, the mapping implied the cache-line invalidations or write-backs depending on the DMA-direction parameter. In case of the NVME SMART log getting the DMA was performed from-device-to-memory, thus the cache-invalidation was activated during the buffer mapping. Since the log-buffer isn't cache-line aligned, the cache-invalidation caused the neighbour data to be discarded. The neighbouring data turned to be the data surrounding the buffer in the framework of the nvme_hwmon_data structure. In order to fix that we need to make sure that the whole log-buffer is defined within the cache-line-aligned memory region so the cache-invalidation procedure wouldn't involve the adjacent data. One of the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the rest of the NVME core driver prefer that method it has been chosen to fix this problem too. Note after a deeper researches we found out that the denoted commit wasn't a root cause of the problem. It just revealed the invalidity by activating the DMA-based NVME SMART log getting performed in the framework of the NVME hwmon driver. The problem was here since the initial commit of the driver. [1] Documentation/core-api/dma-api-howto.rst Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support") Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-hwmon: consistently ignore errors from nvme_hwmon_initChristoph Hellwig2022-10-191-5/+8
| | | | | | | | | | An NVMe controller works perfectly fine even when the hwmon initialization fails. Stop returning errors that do not come from a controller reset from nvme_hwmon_init to handle this case consistently. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
* nvme-hwmon: Return error code when registration failsDaniel Wagner2021-03-051-0/+1
| | | | | | | | | | | The hwmon pointer wont be NULL if the registration fails. Though the exit code path will assign it to ctrl->hwmon_device. Later nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by returning the error code from hwmon_device_register_with_info(). Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation") Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-hwmon: rework to avoid devm allocationHannes Reinecke2021-02-101-10/+21
| | | | | | | | | | | | | | | | The original design to use device-managed resource allocation doesn't really work as the NVMe controller has a vastly different lifetime than the hwmon sysfs attributes, causing warning about duplicate sysfs entries upon reconnection. This patch reworks the hwmon allocation to avoid device-managed resource allocation, and uses the NVMe controller as parent for the sysfs attributes. Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Enzo Matsumiya <ematsumiya@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: return errors for hwmon initKeith Busch2020-09-221-8/+6
| | | | | | | | | | Initializing the nvme hwmon retrieves a log from the controller. If the controller is broken, we need to return the appropriate error so that subsequent initialization doesn't attempt to continue. Reported-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-hwmon: log the controller device nameSagi Grimberg2020-07-291-1/+2
| | | | | | | Stay consistent with the rest of the driver Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: support for multiple Command Sets Supported and Effects log pagesKeith Busch2020-07-081-1/+1
| | | | | | | | | | | | | | | | | The Commands Supported and Effects log page was extended with a CSI field that enables the host to query the log page for each command set supported. Retrieve this log page for each command set that an attached namespace supports, and save a pointer to that log in the namespace head. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Keith Busch <keith.busch@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: hwmon: switch to use <linux/units.h> helpersAkinobu Mita2020-01-311-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This switches the nvme driver to use kelvin_to_millicelsius() and millicelsius_to_kelvin() in <linux/units.h>. Link: http://lkml.kernel.org/r/1576386975-7941-8-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Sujith Thomas <sujith.thomas@intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Andy Shevchenko <andy@infradead.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Amit Kucheria <amit.kucheria@verdurent.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Cc: Hartmut Knaack <knaack.h@gmx.de> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Luca Coelho <luciano.coelho@intel.com> Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nvme: hwmon: add quirk to avoid changing temperature thresholdAkinobu Mita2019-11-221-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing the value of the temperature threshold feature for specific devices that show undesirable behavior. Guenter reported: "On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the Composite temperature sensor results in a temperature warning, and that warning is sticky until I reset the controller. It doesn't seem to matter which temperature I write; writing -273000 has the same result." The Intel NVMe has the latest firmware version installed, so this isn't a problem that was ever fixed. Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jean Delvare <jdelvare@suse.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
* nvme: hwmon: provide temperature min and max values for each sensorAkinobu Mita2019-11-221-16/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the NVMe specification, the over temperature threshold and under temperature threshold features shall be implemented for Composite Temperature if a non-zero WCTEMP field value is reported in the Identify Controller data structure. The features are also implemented for all implemented temperature sensors (i.e., all Temperature Sensor fields that report a non-zero value). This provides the over temperature threshold and under temperature threshold for each sensor as temperature min and max values of hwmon sysfs attributes. The WCTEMP is already provided as a temperature max value for Composite Temperature, but this change isn't incompatible. Because the default value of the over temperature threshold for Composite Temperature is the WCTEMP. Now the alarm attribute for Composite Temperature indicates one of the temperature is outside of a temperature threshold. Because there is only a single bit in Critical Warning field that indicates a temperature is outside of a threshold. Example output from the "sensors" command: nvme-pci-0100 Adapter: PCI adapter Composite: +33.9°C (low = -273.1°C, high = +69.8°C) (crit = +79.8°C) Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C) This also adds helper macros for kelvin from/to milli Celsius conversion, and replaces the repeated code in hwmon.c. Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jean Delvare <jdelvare@suse.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
* nvme: Add hardware monitoring supportGuenter Roeck2019-11-121-0/+181
nvme devices report temperature information in the controller information (for limits) and in the smart log. Currently, the only means to retrieve this information is the nvme command line interface, which requires super-user privileges. At the same time, it would be desirable to be able to use NVMe temperature information for thermal control. This patch adds support to read NVMe temperatures from the kernel using the hwmon API and adds temperature zones for NVMe drives. The thermal subsystem can use this information to set thermal policies, and userspace can access it using libsensors and/or the "sensors" command. Example output from the "sensors" command: nvme0-pci-0100 Adapter: PCI adapter Composite: +39.0°C (high = +85.0°C, crit = +85.0°C) Sensor 1: +39.0°C Sensor 2: +41.0°C Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Keith Busch <kbusch@kernel.org>