summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | scsi: ses: Fix slab-out-of-bounds in ses_intf_remove()Tomas Henzl2023-02-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A fix for: BUG: KASAN: slab-out-of-bounds in ses_intf_remove+0x23f/0x270 [ses] Read of size 8 at addr ffff88a10d32e5d8 by task rmmod/12013 When edev->components is zero, accessing edev->component[0] members is wrong. Link: https://lore.kernel.org/r/20230202162451.15346-5-thenzl@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | scsi: ses: Fix possible desc_ptr out-of-bounds accessesTomas Henzl2023-02-211-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sanitize possible desc_ptr out-of-bounds accesses in ses_enclosure_data_process(). Link: https://lore.kernel.org/r/20230202162451.15346-4-thenzl@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | scsi: ses: Fix possible addl_desc_ptr out-of-bounds accessesTomas Henzl2023-02-211-9/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sanitize possible addl_desc_ptr out-of-bounds accesses in ses_enclosure_data_process(). Link: https://lore.kernel.org/r/20230202162451.15346-3-thenzl@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | scsi: ses: Fix slab-out-of-bounds in ses_enclosure_data_process()Tomas Henzl2023-02-211-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A fix for: BUG: KASAN: slab-out-of-bounds in ses_enclosure_data_process+0x949/0xe30 [ses] Read of size 1 at addr ffff88a1b043a451 by task systemd-udevd/3271 Checking after (and before in next loop) addl_desc_ptr[1] is sufficient, we expect the size to be sanitized before first access to addl_desc_ptr[1]. Make sure we don't walk beyond end of page. Link: https://lore.kernel.org/r/20230202162451.15346-2-thenzl@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | | | | Merge tag 'thermal-6.3-rc1-2' of ↵Linus Torvalds2023-03-032-11/+4
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more thermal control updates from Rafael Wysocki: "These fix two issues in the Intel thermal control drivers. Specifics: - Fix an error pointer dereference in the quark_dts Intel thermal driver (Dan Carpenter) - Fix the intel_bxt_pmic_thermal driver Kconfig entry to select REGMAP which is not user-visible instead of depending on it (Randy Dunlap)" * tag 'thermal-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: BXT_PMIC: select REGMAP instead of depending on it thermal: intel: quark_dts: fix error pointer dereference
| * | | | | | thermal: intel: BXT_PMIC: select REGMAP instead of depending on itRandy Dunlap2023-03-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | REGMAP is a hidden (not user visible) symbol. Users cannot set it directly thru "make *config", so drivers should select it instead of depending on it if they need it. Consistently using "select" or "depends on" can also help reduce Kconfig circular dependency issues. Therefore, change the use of "depends on REGMAP" to "select REGMAP". Fixes: b474303ffd57 ("thermal: add Intel BXT WhiskeyCove PMIC thermal driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | thermal: intel: quark_dts: fix error pointer dereferenceDan Carpenter2023-03-011-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If alloc_soc_dts() fails, then we can just return. Trying to free "soc_dts" will lead to an Oops. Fixes: 8c1876939663 ("thermal: intel Quark SoC X1000 DTS thermal driver") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | | Merge tag 'acpi-6.3-rc1-2' of ↵Linus Torvalds2023-03-033-48/+20
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These update ACPI quirks for some x86 platforms and add an IRQ override quirk for one more system. Specifics: - Add an ACPI IRQ override quirk for Asus Expertbook B2402FBA (Vojtech Hejsek) - Drop a suspend-to-idle quirk for HP Elitebook G9 that is not needed any more after a firmware update (Mario Limonciello) - Add all Cezanne systems to the list for forcing StorageD3Enable, because they all need the same quirk (Mario Limonciello)" * tag 'acpi-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: x86: utils: Add Cezanne to the list for forcing StorageD3Enable ACPI: x86: Drop quirk for HP Elitebook ACPI: resource: Skip IRQ override on Asus Expertbook B2402FBA
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| *-. \ \ \ \ \ \ Merge branches 'acpi-pm' and 'acpi-x86'Rafael J. Wysocki2023-03-032-48/+13
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge additional ACPI quirks for x86 systems: - Drop a suspend-to-idle quirk for HP Elitebook G9 that is not needed any more after a firmware update (Mario Limonciello). - Add all Cezanne systems to the list for forcing StorageD3Enable, because they all need the same quirk (Mario Limonciello). * acpi-pm: ACPI: x86: Drop quirk for HP Elitebook * acpi-x86: ACPI: x86: utils: Add Cezanne to the list for forcing StorageD3Enable
| | | * | | | | | | ACPI: x86: utils: Add Cezanne to the list for forcing StorageD3EnableMario Limonciello2023-03-011-24/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 018d6711c26e4 ("ACPI: x86: Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable") introduced a quirk to allow a system with ambiguous use of _ADR 0 to force StorageD3Enable. It was reported that several more Dell systems suffered the same symptoms. As the list is continuing to grow but these are all Cezanne systems, instead add Cezanne to the CPU list to apply the StorageD3Enable property and remove the whole list. It was also reported that an HP system only has StorageD3Enable on the ACPI device for the first NVME disk, not the second. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217003 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216773 Reported-by: David Alvarez Lombardi <dqalombardi@proton.me> Reported-by: dbilios@stdio.gr Reported-and-tested-by: Elvis Angelaccio <elvis.angelaccio@kde.org> Tested-by: victor.bonnelle@proton.me Tested-by: hurricanepootis@protonmail.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | | | ACPI: x86: Drop quirk for HP ElitebookMario Limonciello2023-02-281-24/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a quirk in `acpi/x86/s2idle.c` for an HP Elitebook G9 platforms to force AMD GUID codepath instead of Microsoft codepath. This was due to a bug with WCN6855 WLAN firmware interaction with the system. This bug is fixed by WCN6855 firmware: WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 Remove the quirk as it's no longer necessary with this firmware. Link: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=c7a57ef688f7d99d8338a5d8edddc8836ff0e6de Tested-by: Anson Tsao <anson.tsao@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | ACPI: resource: Skip IRQ override on Asus Expertbook B2402FBAVojtech Hejsek2023-02-221-0/+7
| | |/ / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Asus Expertbook B2502FBA has IRQ 1 described as Active_Low in its ACPI table. However, the kernel overrides this and sets it to Edge_High, which prevents the internal keyboard from working properly. Adding this laptop model to the override_table fixes the issue. Signed-off-by: Vojtech Hejsek <hejsekvojtech@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | | | | Merge tag 'pm-6.3-rc1-2' of ↵Linus Torvalds2023-03-036-7/+7
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These update power capping (new hardware support and cleanup) and cpufreq (bug fixes, cleanups and intel_pstate adjustment for a new platform). Specifics: - Fix error handling in the apple-soc cpufreq driver (Dan Carpenter) - Change the log level of a message in the amd-pstate cpufreq driver so it is more visible to users (Kai-Heng Feng) - Adjust the balance_performance EPP value for Sapphire Rapids in the intel_pstate cpufreq driver (Srinivas Pandruvada) - Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick Alcock) - Make a read-only kobj_type structure in the schedutil cpufreq governor constant (Thomas Weißschuh) - Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL power capping driver (Sumeet Pawnikar)" * tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: apple-soc: Fix an IS_ERR() vs NULL check powercap: remove MODULE_LICENSE in non-modules cpufreq: intel_pstate: remove MODULE_LICENSE in non-modules powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC cpufreq: amd-pstate: remove MODULE_LICENSE in non-modules cpufreq: schedutil: make kobj_type structure constant cpufreq: amd-pstate: Let user know amd-pstate is disabled cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire Rapids
| * \ \ \ \ \ \ \ \ Merge branch 'powercap'Rafael J. Wysocki2023-03-032-1/+2
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge additional power capping changes for 6.3-rc1: - Remove MODULE_LICENSE from non-modular power capping code (Nick Alcock). - Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL power capping driver (Sumeet Pawnikar). * powercap: powercap: remove MODULE_LICENSE in non-modules powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC
| | * | | | | | | | | powercap: remove MODULE_LICENSE in non-modulesNick Alcock2023-02-281-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | | | | powercap: RAPL: Add Power Limit4 support for Meteor Lake SoCSumeet Pawnikar2023-02-231-0/+2
| | | |/ / / / / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Meteor Lake SoC to the list of processor models for which Power Limit4 is supported by the Intel RAPL driver. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | cpufreq: apple-soc: Fix an IS_ERR() vs NULL checkDan Carpenter2023-03-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The of_iomap() function returns NULL if it fails. It never returns error pointers. Fix the check accordingly. Fixes: 6286bbb40576 ("cpufreq: apple-soc: Add new driver to control Apple SoC CPU P-states") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | cpufreq: intel_pstate: remove MODULE_LICENSE in non-modulesNick Alcock2023-02-281-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | cpufreq: amd-pstate: remove MODULE_LICENSE in non-modulesNick Alcock2023-02-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in amd-pstate.c which cannot be built as a module. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> [ rjw: Subject and changelog adjustments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | cpufreq: schedutil: make kobj_type structure constantThomas Weißschuh2023-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | cpufreq: amd-pstate: Let user know amd-pstate is disabledKai-Heng Feng2023-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 202e683df37c ("cpufreq: amd-pstate: add amd-pstate driver parameter for mode selection") changed the driver to be disabled by default, and this can surprise users. Let users know what happened so they can decide what to do next. Link: https://bugs.launchpad.net/bugs/2006942 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Yuan Perry <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire RapidsSrinivas Pandruvada2023-02-231-0/+1
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the majority of server OS distributions are deployed with the "performance" governor as the default, some distributions like Ubuntu use the "powersave" governor by default. While using the "powersave" governor in its default configuration on Sapphire Rapids systems leads to much lower power, the performance is lower by more than 25% for several workloads relative to the "performance" governor. A 37% difference has been reported by www.Phoronix.com [1]. This is a consequence of using a relatively high EPP value in the default configuration of the "powersave" governor and the performance can be made much closer to the "performance" governor's level by adjusting the default EPP value. Based on experiments, with EPP of 0x00, 0x10, 0x20, the performance delta between the "powersave" governor and the "performance" one is around 12%. However, the EPP of 0x20 reduces average power by 18% with respect to the lower EPP values. [Note that raising min_perf_pct in sysfs as high as 50% in addition to adjusting EPP does not improve the performance any further.] For this reason, change the EPP value corresponding to the the default balance_performance setting for Sapphire Rapids to 0x20, which is straightforward, because analogous default EPP adjustment has been applied to Alder Lake and there is a way to set the balance_performance EPP value in intel_pstate based on the processor model already. The goal here is to limit the mean performance delta between the "powersave" governor in the default configuration and the "performance" governor for a wide variety of server workloadsto to around 10-12%. For some bursty workloads, this delta can be still large, as the frequency ramp-up will still lag when the "powersave" governor is in use irrespective of the EPP setting, because the performance governor always requests the maximum possible frequency. Link: https://www.phoronix.com/review/centos-clear-spr/6 # [1] Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | | | | Merge tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linuxLinus Torvalds2023-03-039-62/+85
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more io_uring updates from Jens Axboe: "Here's a set of fixes/changes that didn't make the first cut, either because they got queued before I sent the early merge request, or fixes that came in afterwards. In detail: - Don't set MSG_NOSIGNAL on recv/recvmsg opcodes, as AF_PACKET will error out (David) - Fix for spurious poll wakeups (me) - Fix for a file leak for buffered reads in certain conditions (Joseph) - Don't allow registered buffers of mixed types (Pavel) - Improve handling of huge pages for registered buffers (Pavel) - Provided buffer ring size calculation fix (Wojciech) - Minor cleanups (me)" * tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux: io_uring/poll: don't pass in wake func to io_init_poll_iocb() io_uring: fix fget leak when fs don't support nowait buffered read io_uring/poll: allow some retries for poll triggering spuriously io_uring: remove MSG_NOSIGNAL from recvmsg io_uring/rsrc: always initialize 'folio' to NULL io_uring/rsrc: optimise registered huge pages io_uring/rsrc: optimise single entry advance io_uring/rsrc: disallow multi-source reg buffers io_uring: remove unused wq_list_merge io_uring: fix size calculation when registering buf ring io_uring/rsrc: fix a comment in io_import_fixed() io_uring: rename 'in_idle' to 'in_cancel' io_uring: consolidate the put_ref-and-return section of adding work
| * | | | | | | | | io_uring/poll: don't pass in wake func to io_init_poll_iocb()Jens Axboe2023-03-011-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only use one, and it's io_poll_wake(). Hardwire that in the initial init, as well as in __io_queue_proc() if we're setting up for double poll. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring: fix fget leak when fs don't support nowait buffered readJoseph Qi2023-02-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Heming reported a BUG when using io_uring doing link-cp on ocfs2. [1] Do the following steps can reproduce this BUG: mount -t ocfs2 /dev/vdc /mnt/ocfs2 cp testfile /mnt/ocfs2/ ./link-cp /mnt/ocfs2/testfile /mnt/ocfs2/testfile.1 umount /mnt/ocfs2 Then umount will fail, and it outputs: umount: /mnt/ocfs2: target is busy. While tracing umount, it blames mnt_get_count() not return as expected. Do a deep investigation for fget()/fput() on related code flow, I've finally found that fget() leaks since ocfs2 doesn't support nowait buffered read. io_issue_sqe |-io_assign_file // do fget() first |-io_read |-io_iter_do_read |-ocfs2_file_read_iter // return -EOPNOTSUPP |-kiocb_done |-io_rw_done |-__io_complete_rw_common // set REQ_F_REISSUE |-io_resubmit_prep |-io_req_prep_async // override req->file, leak happens This was introduced by commit a196c78b5443 in v5.18. Fix it by don't re-assign req->file if it has already been assigned. [1] https://lore.kernel.org/ocfs2-devel/ab580a75-91c8-d68a-3455-40361be1bfa8@linux.alibaba.com/T/#t Fixes: a196c78b5443 ("io_uring: assign non-fixed early for async work") Cc: <stable@vger.kernel.org> Reported-by: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/20230228045459.13524-1-joseph.qi@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring/poll: allow some retries for poll triggering spuriouslyJens Axboe2023-02-252-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we get woken spuriously when polling and fail the operation with -EAGAIN again, then we generally only allow polling again if data had been transferred at some point. This is indicated with REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket was originally empty, then we haven't transferred data yet and we will fail the poll re-arm. This either punts the socket to io-wq if it's blocking, or it fails the request with -EAGAIN if not. Neither condition is desirable, as the former will slow things down, while the latter will make the application confused. We want to ensure that a repeated poll trigger doesn't lead to infinite work making no progress, that's what the REQ_F_PARTIAL_IO check was for. But it doesn't protect against a loop post the first receive, and it's unnecessarily strict if we started out with an empty socket. Add a somewhat random retry count, just to put an upper limit on the potential number of retries that will be done. This should be high enough that we won't really hit it in practice, unless something needs to be aborted anyway. Cc: stable@vger.kernel.org # v5.10+ Link: https://github.com/axboe/liburing/issues/364 Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring: remove MSG_NOSIGNAL from recvmsgDavid Lamparter2023-02-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MSG_NOSIGNAL is not applicable for the receiving side, SIGPIPE is generated when trying to write to a "broken pipe". AF_PACKET's packet_recvmsg() does enforce this, giving back EINVAL when MSG_NOSIGNAL is set - making it unuseable in io_uring's recvmsg. Remove MSG_NOSIGNAL from io_recvmsg_prep(). Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: David Lamparter <equinox@diac24.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230224150123.128346-1-equinox@diac24.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring/rsrc: always initialize 'folio' to NULLJens Axboe2023-02-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Smatch complains that: smatch warnings: io_uring/rsrc.c:1262 io_sqe_buffer_register() error: uninitialized symbol 'folio'. 'folio' may be used uninitialized, which can happen if we end up with a single page mapped. Ensure that we clear folio to NULL at the top so it's always set. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202302241432.YML1CD5C-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring/rsrc: optimise registered huge pagesPavel Begunkov2023-02-221-6/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When registering huge pages, internally io_uring will split them into many PAGE_SIZE bvec entries. That's bad for performance as drivers need to eventually dma-map the data and will do it individually for each bvec entry. Coalesce huge pages into one large bvec. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring/rsrc: optimise single entry advancePavel Begunkov2023-02-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Iterating within the first bvec entry should be essentially free, but we use iov_iter_advance() for that, which shows up in benchmark profiles taking up to 0.5% of CPU. Replace it with a hand coded version. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring/rsrc: disallow multi-source reg buffersPavel Begunkov2023-02-221-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If two or more mappings go back to back to each other they can be passed into io_uring to be registered as a single registered buffer. That would even work if mappings came from different sources, e.g. it's possible to mix in this way anon pages and pages from shmem or hugetlb. That is not a problem but it'd rather be less prone if we forbid such mixing. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring: remove unused wq_list_mergePavel Begunkov2023-02-221-22/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no users of wq_list_merge, kill it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5f9ad0301949213230ad9000a8359d591aae615a.1677002255.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring: fix size calculation when registering buf ringWojciech Lukowicz2023-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using struct_size() to calculate the size of io_uring_buf_ring will sum the size of the struct and of the bufs array. However, the struct's fields are overlaid with the array making the calculated size larger than it should be. When registering a ring with N * PAGE_SIZE / sizeof(struct io_uring_buf) entries, i.e. with fully filled pages, the calculated size will span one more page than it should and io_uring will try to pin the following page. Depending on how the application allocated the ring, it might succeed using an unrelated page or fail returning EFAULT. The size of the ring should be the product of ring_entries and the size of io_uring_buf, i.e. the size of the bufs array only. Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230218184141.70891-1-wlukowicz01@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring/rsrc: fix a comment in io_import_fixed()Pavel Begunkov2023-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | io_import_fixed() supports offsets, but "may not" means the opposite. Replace it with "might not" so the comments rather speaks about possible cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/5b5f79958456caa6dc532f6205f75f224b232c81.1676902343.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring: rename 'in_idle' to 'in_cancel'Jens Axboe2023-02-223-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This better describes what it does - it's incremented when the task is currently undergoing a cancelation operation, due to exiting or exec'ing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | io_uring: consolidate the put_ref-and-return section of adding workJens Axboe2023-02-221-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've got a few cases of this, move them to one section and just use gotos to get there. Reduces the text section on both arm64 and x86-64, using gcc-12.2. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | | | | Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linuxLinus Torvalds2023-03-0320-85/+114
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - Don't access released socket during error recovery (Akinobu Mita) - Bring back auto-removal of deleted namespaces during sequential scan (Christoph Hellwig) - Fix an error code in nvme_auth_process_dhchap_challenge (Dan Carpenter) - Show well known discovery name (Daniel Wagner) - Add a missing endianess conversion in effects masking (Keith Busch) - Fix for a regression introduced in blk-rq-qos during init in this merge window (Breno) - Reorder a few fields in struct blk_mq_tag_set, eliminating a few holes and shrinking it (Christophe) - Remove redundant bdev_get_queue() NULL checks (Juhyung) - Add sed-opal single user mode support flag (Luca) - Remove SQE128 check in ublk as it isn't needed, saving some memory (Ming) - Op specific segment checking for cloned requests (Uday) - Exclusive open partition scan fixes (Yu) - Loop offset/size checking before assigning them in the device (Zhong) - Bio polling fixes (me) * tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux: blk-mq: enforce op-specific segment limits in blk_insert_cloned_request nvme-fabrics: show well known discovery name nvme-tcp: don't access released socket during error recovery nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge() nvme: bring back auto-removal of deleted namespaces during sequential scan blk-iocost: Pass gendisk to ioc_refresh_params nvme: fix sparse warning on effects masking block: be a bit more careful in checking for NULL bdev while polling block: clear bio->bi_bdev when putting a bio back in the cache loop: loop_set_status_from_info() check before assignment ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd block: remove more NULL checks after bdev_get_queue() blk-mq: Reorder fields in 'struct blk_mq_tag_set' block: fix scan partition for exclusively open device again block: Revert "block: Do not reread partition table on exclusively open device" sed-opal: add support flag for SUM in status ioctl
| * | | | | | | | | | blk-mq: enforce op-specific segment limits in blk_insert_cloned_requestUday Shankar2023-03-023-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The block layer might merge together discard requests up until the max_discard_segments limit is hit, but blk_insert_cloned_request checks the segment count against max_segments regardless of the req op. This can result in errors like the following when discards are issued through a DM device and max_discard_segments exceeds max_segments for the queue of the chosen underlying device. blk_insert_cloned_request: over max segments limit. (256 > 129) Fix this by looking at the req_op and enforcing the appropriate segment limit - max_discard_segments for REQ_OP_DISCARDs and max_segments for everything else. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230301000655.48112-1-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | Merge tag 'nvme-6.3-2022-03-01' of git://git.infradead.org/nvme into ↵Jens Axboe2023-03-014-20/+28
| |\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for-6.3/block Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - don't access released socket during error recovery (Akinobu Mita) - bring back auto-removal of deleted namespaces during sequential scan (Christoph Hellwig) - fix an error code in nvme_auth_process_dhchap_challenge (Dan Carpenter) - show well known discovery name (Daniel Wagner) - add a missing endianess conversion in effects masking (Keith Busch)" * tag 'nvme-6.3-2022-03-01' of git://git.infradead.org/nvme: nvme-fabrics: show well known discovery name nvme-tcp: don't access released socket during error recovery nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge() nvme: bring back auto-removal of deleted namespaces during sequential scan nvme: fix sparse warning on effects masking
| | * | | | | | | | | | nvme-fabrics: show well known discovery nameDaniel Wagner2023-02-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel always logs the unique subsystem name for a discovery controller, even in the case user space asked for the well known. This has lead to confusion as the logs of nvme-cli and the kernel logs didn't match. First, nvme-cli connects to the well known discovery controller to figure out if it supports TP8013. If so then nvme-cli disconnects and connects to the unique discovery controller. Currently, the kernel show that user space connected twice to the unique one. To avoid further confusion, show the well known discovery controller if user space asked for it: $ nvme connect-all -v -t tcp -a 192.168.0.1 nvme0: nqn.2014-08.org.nvmexpress.discovery connected nvme0: nqn.2014-08.org.nvmexpress.discovery disconnected nvme0: nqn.discovery connected kernel log: nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.0.1:8009 nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" nvme nvme0: new ctrl: NQN "nqn.discovery", addr 192.168.0.1:8009 Fixes: e5ea42faa773 ("nvme: display correct subsystem NQN") Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | | | | | | | | nvme-tcp: don't access released socket during error recoveryAkinobu Mita2023-02-281-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the error recovery work is temporarily failing reconnect attempts, running the 'nvme list' command causes a kernel NULL pointer dereference by calling getsockname() with a released socket. During error recovery work, the nvme tcp socket is released and a new one created, so it is not safe to access the socket without proper check. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Fixes: 02c57a82c008 ("nvme-tcp: print actual source IP address through sysfs "address" attr") Reviewed-by: Martin Belanger <martin.belanger@dell.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | | | | | | | | nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()Dan Carpenter2023-02-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function was transitioned from returning NVMe status codes to returning traditional kernel error codes. However, this particular return now accidentally returns positive error codes like ENOMEM instead of negative -ENOMEM. Fixes: b0ef1b11d390 ("nvme-auth: don't use NVMe status codes") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | | | | | | | | nvme: bring back auto-removal of deleted namespaces during sequential scanChristoph Hellwig2023-02-281-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bring back the check of the Identify Namespace return value for the legacy NVMe 1.0-style sequential scanning. While NVMe 1.0 does not support namespace management, there are "modern" cloud solutions like Google Cloud Platform that claim the obsolete 1.0 compliance for no good reason while supporting proprietary sideband namespace management. Fixes: 1a893c2bfef4 ("nvme: refactor namespace probing") Reported-by: Nils Hanke <nh@edgeless.systems> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Nils Hanke <nh@edgeless.systems>
| | * | | | | | | | | | nvme: fix sparse warning on effects maskingKeith Busch2023-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The log entries are stored in le32, so use appropriate byte swapping macros. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302242222.PevBhzvC-lkp@intel.com/ Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | | | | | | blk-iocost: Pass gendisk to ioc_refresh_paramsBreno Leitao2023-02-281-6/+20
| |/ / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current kernel (d2980d8d826554fa6981d621e569a453787472f8) crashes when blk_iocost_init for `nvme1` disk. BUG: kernel NULL pointer dereference, address: 0000000000000050 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page blk_iocost_init (include/asm-generic/qspinlock.h:128 include/linux/spinlock.h:203 include/linux/spinlock_api_smp.h:158 include/linux/spinlock.h:400 block/blk-iocost.c:2884) ioc_qos_write (block/blk-iocost.c:3198) ? kretprobe_perf_func (kernel/trace/trace_kprobe.c:1566) ? kernfs_fop_write_iter (include/linux/slab.h:584 fs/kernfs/file.c:311) ? __kmem_cache_alloc_node (mm/slab.h:? mm/slub.c:3452 mm/slub.c:3491) ? _copy_from_iter (arch/x86/include/asm/uaccess_64.h:46 arch/x86/include/asm/uaccess_64.h:52 lib/iov_iter.c:183 lib/iov_iter.c:628) ? kretprobe_dispatcher (kernel/trace/trace_kprobe.c:1693) cgroup_file_write (kernel/cgroup/cgroup.c:4061) kernfs_fop_write_iter (fs/kernfs/file.c:334) vfs_write (include/linux/fs.h:1849 fs/read_write.c:491 fs/read_write.c:584) ksys_write (fs/read_write.c:637) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) This happens because ioc_refresh_params() is being called without a properly initialized ioc->rqos, which is happening later in the callee side. ioc_refresh_params() -> ioc_autop_idx() tries to access ioc->rqos.disk->queue but ioc->rqos.disk is NULL, causing the BUG above. Create function, called ioc_refresh_params_disk(), that is similar to ioc_refresh_params() but where the "struct gendisk" could be passed as an explicit argument. This function will be called when ioc->rqos.disk is not initialized. Fixes: ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230228111654.1778120-1-leitao@debian.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | block: be a bit more careful in checking for NULL bdev while pollingJens Axboe2023-02-241-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wei reports a crash with an application using polled IO: PGD 14265e067 P4D 14265e067 PUD 47ec50067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 21915 Comm: iocore_0 Kdump: loaded Tainted: G S 5.12.0-0_fbk12_clang_7346_g1bb6f2e7058f #1 Hardware name: Wiwynn Delta Lake MP T8/Delta Lake-Class2, BIOS Y3DLM08 04/10/2022 RIP: 0010:bio_poll+0x25/0x200 Code: 0f 1f 44 00 00 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 44 24 20 48 8b 47 08 <48> 8b 80 70 02 00 00 4c 8b 70 50 8b 6f 34 31 db 83 fd ff 75 25 65 RSP: 0018:ffffc90005fafdf8 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 74b43cd65dd66600 RDX: 0000000000000003 RSI: ffffc90005fafe78 RDI: ffff8884b614e140 RBP: ffff88849964df78 R08: 0000000000000000 R09: 0000000000000008 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88849964df00 R13: ffffc90005fafe78 R14: ffff888137d3c378 R15: 0000000000000001 FS: 00007fd195000640(0000) GS:ffff88903f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000270 CR3: 0000000466121001 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: iocb_bio_iopoll+0x1d/0x30 io_do_iopoll+0xac/0x250 __se_sys_io_uring_enter+0x3c5/0x5a0 ? __x64_sys_write+0x89/0xd0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x94f225d Code: 24 cc 00 00 00 41 8b 84 24 d0 00 00 00 c1 e0 04 83 e0 10 41 09 c2 8b 33 8b 53 04 4c 8b 43 18 4c 63 4b 0c b8 aa 01 00 00 0f 05 <85> c0 0f 88 85 00 00 00 29 03 45 84 f6 0f 84 88 00 00 00 41 f6 c7 RSP: 002b:00007fd194ffcd88 EFLAGS: 00000202 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 00007fd194ffcdc0 RCX: 00000000094f225d RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007 RBP: 00007fd194ffcdb0 R08: 0000000000000000 R09: 0000000000000008 R10: 0000000000000001 R11: 0000000000000202 R12: 00007fd269d68030 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 which is due to bio->bi_bdev being NULL. This can happen if we have two tasks doing polled IO, and task B ends up completing IO from task A if they are sharing a poll queue. If task B completes the IO and puts the bio into our cache, then it can allocate that bio again before task A is done polling for it. As that would necessitate a preempt between the two tasks, it's enough to just be a bit more careful in checking for whether or not bio->bi_bdev is NULL. Reported-and-tested-by: Wei Zhang <wzhang@meta.com> Cc: stable@vger.kernel.org Fixes: be4d234d7aeb ("bio: add allocation cache abstraction") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | block: clear bio->bi_bdev when putting a bio back in the cacheJens Axboe2023-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This isn't strictly needed in terms of correctness, but it does allow polling to know if the bio has been put already by a different task and hence avoid polling something that we don't need to. Cc: stable@vger.kernel.org Fixes: be4d234d7aeb ("bio: add allocation cache abstraction") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | loop: loop_set_status_from_info() check before assignmentZhong Jinghua2023-02-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In loop_set_status_from_info(), lo->lo_offset and lo->lo_sizelimit should be checked before reassignment, because if an overflow error occurs, the original correct value will be changed to the wrong value, and it will not be changed back. More, the original patch did not solve the problem, the value was set and ioctl returned an error, but the subsequent io used the value in the loop driver, which still caused an alarm: loop_handle_cmd do_req_filebacked loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset; lo_rw_aio cmd->iocb.ki_pos = pos Fixes: c490a0b5a4f3 ("loop: Check for overflow while configuring loop") Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230221095027.3656193-1-zhongjinghua@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmdMing Lei2023-02-211-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sizeof(struct ublksrv_io_cmd) is 16bytes, which can be held in 64byte SQE, so not necessary to check IO_URING_F_SQE128. With this change, we get chance to save half SQ ring memory. Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230220041413.1524335-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | | block: remove more NULL checks after bdev_get_queue()Juhyung Park2023-02-213-21/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdev_get_queue() never returns NULL. Several commits [1][2] have been made before to remove such superfluous checks, but some still remained. For places where bdev_get_queue() is called solely for NULL checks, it is removed entirely. [1] commit ec9fd2a13d74 ("blk-lib: don't check bdev_get_queue() NULL check") [2] commit fea127b36c93 ("block: remove superfluous check for request queue in bdev_is_zoned()") Signed-off-by: Juhyung Park <qkrwngud825@gmail.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230203024029.48260-1-qkrwngud825@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>