summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* clk: qcom: mmcc-msm8998: fix venus clock issueMarc Gonzalez2024-04-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, msm8998 video decoder (venus) is non-functional: $ time mpv --hwdec=v4l2m2m-copy --vd-lavc-software-fallback=no --vo=null --no-audio --untimed --length=30 --quiet demo-480.webm (+) Video --vid=1 (*) (vp9 854x480 29.970fps) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz) [ffmpeg/video] vp9_v4l2m2m: output VIDIOC_REQBUFS failed: Connection timed out [ffmpeg/video] vp9_v4l2m2m: no v4l2 output context's buffers [ffmpeg/video] vp9_v4l2m2m: can't configure decoder Could not open codec. Software decoding fallback is disabled. Exiting... (Quit) Bryan O'Donoghue suggested the proper fix: - Set required register offsets in venus GDSC structs. - Set HW_CTRL flag. $ time mpv --hwdec=v4l2m2m-copy --vd-lavc-software-fallback=no --vo=null --no-audio --untimed --length=30 --quiet demo-480.webm (+) Video --vid=1 (*) (vp9 854x480 29.970fps) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz) [ffmpeg/video] vp9_v4l2m2m: VIDIOC_G_FMT ioctl [ffmpeg/video] vp9_v4l2m2m: VIDIOC_G_FMT ioctl ... Using hardware decoding (v4l2m2m-copy). VO: [null] 854x480 nv12 Exiting... (End of file) real 0m3.315s user 0m1.277s sys 0m0.453s NOTES: GDSC = Globally Distributed Switch Controller Use same code as mmcc-msm8996 with: s/venus_gdsc/video_top_gdsc/ s/venus_core0_gdsc/video_subcore0_gdsc/ s/venus_core1_gdsc/video_subcore1_gdsc/ https://git.codelinaro.org/clo/la/kernel/msm-4.4/-/blob/caf_migration/kernel.lnx.4.4.r38-rel/include/dt-bindings/clock/msm-clocks-hwio-8996.h https://git.codelinaro.org/clo/la/kernel/msm-4.4/-/blob/caf_migration/kernel.lnx.4.4.r38-rel/include/dt-bindings/clock/msm-clocks-hwio-8998.h 0x1024 = MMSS_VIDEO GDSCR (undocumented) 0x1028 = MMSS_VIDEO_CORE_CBCR 0x1030 = MMSS_VIDEO_AHB_CBCR 0x1034 = MMSS_VIDEO_AXI_CBCR 0x1038 = MMSS_VIDEO_MAXI_CBCR 0x1040 = MMSS_VIDEO_SUBCORE0 GDSCR (undocumented) 0x1044 = MMSS_VIDEO_SUBCORE1 GDSCR (undocumented) 0x1048 = MMSS_VIDEO_SUBCORE0_CBCR 0x104c = MMSS_VIDEO_SUBCORE1_CBCR Fixes: d14b15b5931c2b ("clk: qcom: Add MSM8998 Multimedia Clock Controller (MMCC) driver") Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Marc Gonzalez <mgonzalez@freebox.fr> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://lore.kernel.org/r/ff4e2e34-a677-4c39-8c29-83655c5512ae@freebox.fr Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: dispcc-sm8650: fix DisplayPort clocksDmitry Baryshkov2024-04-271-16/+4
| | | | | | | | | | | | | | | | | | | On SM8650 DisplayPort link clocks use frequency tables inherited from the vendor kernel, it is not applicable in the upstream kernel. Drop frequency tables and use clk_byte2_ops for those clocks. This fixes frequency selection in the OPP core (which otherwise attempts to use invalid 810 KHz as DP link rate), also fixing the following message: msm-dp-display af54000.displayport-controller: _opp_config_clk_single: failed to set clock rate: -22 Fixes: 9e939f008338 ("clk: qcom: add the SM8650 Display Clock Controller driver") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-HDK Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240424-dispcc-dp-clocks-v2-4-b44038f3fa96@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: dispcc-sm8550: fix DisplayPort clocksDmitry Baryshkov2024-04-271-16/+4
| | | | | | | | | | | | | | | | | | | On SM8550 DisplayPort link clocks use frequency tables inherited from the vendor kernel, it is not applicable in the upstream kernel. Drop frequency tables and use clk_byte2_ops for those clocks. This fixes frequency selection in the OPP core (which otherwise attempts to use invalid 810 KHz as DP link rate), also fixing the following message: msm-dp-display ae90000.displayport-controller: _opp_config_clk_single: failed to set clock rate: -22 Fixes: 90114ca11476 ("clk: qcom: add SM8550 DISPCC driver") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-HDK Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240424-dispcc-dp-clocks-v2-3-b44038f3fa96@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: dispcc-sm6350: fix DisplayPort clocksDmitry Baryshkov2024-04-271-10/+1
| | | | | | | | | | | | | | | | | | | On SM6350 DisplayPort link clocks use frequency tables inherited from the vendor kernel, it is not applicable in the upstream kernel. Drop frequency tables and use clk_byte2_ops for those clocks. This fixes frequency selection in the OPP core (which otherwise attempts to use invalid 810 KHz as DP link rate), also fixing the following message: msm-dp-display ae90000.displayport-controller: _opp_config_clk_single: failed to set clock rate: -22 Fixes: 837519775f1d ("clk: qcom: Add display clock controller driver for SM6350") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Luca Weiss <luca.weiss@fairphone.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240424-dispcc-dp-clocks-v2-2-b44038f3fa96@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: dispcc-sm8450: fix DisplayPort clocksDmitry Baryshkov2024-04-271-16/+4
| | | | | | | | | | | | | | | | | | On SM8450 DisplayPort link clocks use frequency tables inherited from the vendor kernel, it is not applicable in the upstream kernel. Drop frequency tables and use clk_byte2_ops for those clocks. This fixes frequency selection in the OPP core (which otherwise attempts to use invalid 810 KHz as DP link rate), also fixing the following message: msm-dp-display ae90000.displayport-controller: _opp_config_clk_single: failed to set clock rate: -22 Fixes: 16fb89f92ec4 ("clk: qcom: Add support for Display Clock Controller on SM8450") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240424-dispcc-dp-clocks-v2-1-b44038f3fa96@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: clk-cbf-8996: use HUAYRA_APSS register map for cbf_pllGabor Juhos2024-04-271-12/+1
| | | | | | | | | | | | | | | | | | The register map used for 'cbf_pll' is the same as the one defined for the CLK_ALPHA_PLL_TYPE_HUAYRA_APSS indice in the 'clk_alpha_pll_regs' array. Drop the local register map and use the global one instead to reduce code duplication. No functional changes intended. Compile tested only. Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Link: https://lore.kernel.org/r/20240328-apss-ipq-pll-cleanup-v4-5-eddbf617f0c8@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: apss-ipq-pll: constify clk_init_data structuresGabor Juhos2024-04-271-2/+2
| | | | | | | | | | | | | The clk_init_data structures are never modified, so add const qualifier to the ones where it is missing. No functional changes. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Link: https://lore.kernel.org/r/20240328-apss-ipq-pll-cleanup-v4-4-eddbf617f0c8@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: apss-ipq-pll: constify match data structuresGabor Juhos2024-04-271-4/+4
| | | | | | | | | | | | | The match data structures are used only by the apss_ipq_pll_probe() function and are never modified so mark those as constant. No functional changes. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Link: https://lore.kernel.org/r/20240328-apss-ipq-pll-cleanup-v4-3-eddbf617f0c8@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: apss-ipq-pll: move Huayra register map to 'clk_alpha_pll_regs'Gabor Juhos2024-04-273-19/+12
| | | | | | | | | | | | | Move the locally defined Huayra register map to 'clk_alpha_pll_regs' in order to allow using that by other drivers, like the clk-cbf-8996. No functional changes. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Link: https://lore.kernel.org/r/20240328-apss-ipq-pll-cleanup-v4-2-eddbf617f0c8@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* clk: qcom: apss-ipq-pll: reuse Stromer reg offsets from 'clk_alpha_pll_regs'Gabor Juhos2024-04-271-18/+6
| | | | | | | | | | | | | | | | | | The register offset array defined locally for the CLK_ALPHA_PLL_TYPE_STROMER_PLUS is the same as the entry defined for CLK_ALPHA_PLL_TYPE_STROMER in the 'clk_alpha_pll_regs' array. To avoid code duplication, remove the local definition and use the global one instead. No functional changes. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Link: https://lore.kernel.org/r/20240328-apss-ipq-pll-cleanup-v4-1-eddbf617f0c8@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* Merge branch ↵Bjorn Andersson2024-04-271-3/+27
|\ | | | | | | | | | | | | '20240315-apss-ipq-pll-ipq5018-hang-v2-1-6fe30ada2009@gmail.com' into clk-for-6.10 Merge IPQ5018 boot failure fix from topic branch, in order to be able to add subsequent cleanup patches on top, for v6.10.
| * clk: qcom: apss-ipq-pll: use stromer ops for IPQ5018 to fix boot failureGabor Juhos2024-04-271-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Booting v6.8 results in a hang on various IPQ5018 based boards. Investigating the problem showed that the hang happens when the clk_alpha_pll_stromer_plus_set_rate() function tries to write into the PLL_MODE register of the APSS PLL. Checking the downstream code revealed that it uses [1] stromer specific operations for IPQ5018, whereas in the current code the stromer plus specific operations are used. The ops in the 'ipq_pll_stromer_plus' clock definition can't be changed since that is needed for IPQ5332, so add a new alpha pll clock declaration which uses the correct stromer ops and use this new clock for IPQ5018 to avoid the boot failure. Also, change pll_type in 'ipq5018_pll_data' to CLK_ALPHA_PLL_TYPE_STROMER to better reflect that it is a Stromer PLL and change the apss_ipq_pll_probe() function accordingly. 1. https://git.codelinaro.org/clo/qsdk/oss/kernel/linux-ipq-5.4/-/blob/NHSS.QSDK.12.4/drivers/clk/qcom/apss-ipq5018.c#L67 Cc: stable@vger.kernel.org Fixes: 50492f929486 ("clk: qcom: apss-ipq-pll: add support for IPQ5018") Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Tested-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com> Reviewed-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com> Link: https://lore.kernel.org/r/20240315-apss-ipq-pll-ipq5018-hang-v2-1-6fe30ada2009@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: gcc-ipq8074: rework nss_port5/6 clock to multiple confChristian Marangi2024-04-271-44/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework nss_port5/6 to use the new multiple configuration implementation and correctly fix the clocks for these port under some corner case. This is particularly relevant for device that have 2.5G or 10G port connected to port5 or port 6 on ipq8074. As the parent are shared across multiple port it may be required to select the correct configuration to accomplish the desired clock. Without this patch such port doesn't work in some specific ethernet speed as the clock will be set to the wrong frequency as we just select the first configuration for the related frequency instead of selecting the best one. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Stephen Boyd <sboyd@kernel.org> Link: https://lore.kernel.org/r/20231220221724.3822-4-ansuelsmth@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: clk-rcg2: add support for rcg2 freq multi opsChristian Marangi2024-04-274-0/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some RCG frequency can be reached by multiple configuration. Add clk_rcg2_fm_ops ops to support these special RCG configurations. These alternative ops will select the frequency using a CEIL policy. When the correct frequency is found, the correct config is selected by calculating the final rate (by checking the defined parent and values in the config that is being checked) and deciding based on the one that is less different than the requested one. These check are skipped if there is just one config for the requested freq. qcom_find_freq_multi is added to search the freq with the new struct freq_multi_tbl. __clk_rcg2_select_conf is used to select the correct conf by simulating the final clock. If a conf can't be found due to parent not reachable, a WARN is printed and -EINVAL is returned. Tested-by: Wei Lei <quic_leiwei@quicinc.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Stephen Boyd <sboyd@kernel.org> Link: https://lore.kernel.org/r/20231220221724.3822-3-ansuelsmth@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: clk-rcg: introduce support for multiple conf for same freqChristian Marangi2024-04-271-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some RCG frequency can be reached by multiple configuration. We currently declare multiple configuration for the same frequency but that is not supported and always the first configuration will be taken. These multiple configuration are needed as based on the current parent configuration, it may be needed to use a different configuration to reach the same frequency. To handle this introduce 3 new macro, C, FM and FMS: - C is used to declare a freq_conf where src, pre_div, m and n are provided. - FM is used to declare a freq_multi_tbl with the frequency and an array of confs to insert all the config for the provided frequency. - FMS is used to declare a freq_multi_tbl with the frequency and an array of a single conf with the provided src, pre_div, m and n. Struct clk_rcg2 is changed to add a union type to reference a simple freq_tbl or a complex freq_multi_tbl. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Stephen Boyd <sboyd@kernel.org> Link: https://lore.kernel.org/r/20231220221724.3822-2-ansuelsmth@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: hfpll: Add QCS404-specific compatibleLuca Weiss2024-04-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | It doesn't appear that the configuration is for the HFPLL is generic, so add a qcs404-specific compatible and rename the existing struct to qcs404. Keep qcom,hfpll in the driver for compatibility with old dtbs. Signed-off-by: Luca Weiss <luca@z3ntu.xyz> Link: https://lore.kernel.org/r/20240218-hfpll-yaml-v2-2-31543e0d6261@z3ntu.xyz Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | dt-bindings: clock: qcom,hfpll: Convert to YAMLLuca Weiss2024-04-232-63/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the .txt documentation to .yaml with some adjustments. * APQ8064/IPQ8064/MSM8960 compatibles are dropped since their HFPLLs are a part of GCC so there is no need for a separate compat entry. * Change the MSM8974 compatible to follow the updated naming schema. Theis compatible is not used upstream yet. * Add qcs404-hfpll. QCS404 currently uses qcom,hfpll. Mark that as deprecated since every SoC appears to need different driver data so "qcom,hfpll" makes no sense to keep Signed-off-by: Luca Weiss <luca@z3ntu.xyz> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240218-hfpll-yaml-v2-1-31543e0d6261@z3ntu.xyz Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: clk-alpha-pll: fix kerneldoc of struct clk_alpha_pllGabor Juhos2024-04-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing descriptions of the 'num_vco' and 'flags' members to clk_alpha_pll structure's documentation. Also reorder the member description entries to match the order of the declarations. Eliminates the following warnings: drivers/clk/qcom/clk-alpha-pll.h:72: info: Scanning doc for struct clk_alpha_pll drivers/clk/qcom/clk-alpha-pll.h:91: warning: Function parameter or struct member 'num_vco' not described in 'clk_alpha_pll' drivers/clk/qcom/clk-alpha-pll.h:91: warning: Function parameter or struct member 'flags' not described in 'clk_alpha_pll' No functional changes. Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240321-alpha-pll-kerneldoc-v1-1-0d76926b72c3@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: clk-alpha-pll: reorder Stromer register offsetsGabor Juhos2024-04-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The register offset arrays are ordered based on the register offsets for all PLLs but the Stromer. For consistency, reorder the Stromer specific array as well. No functional changes. Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240311-alpha-pll-stromer-cleanup-v1-2-f7c0c5607cca@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: clk-alpha-pll: remove invalid Stromer register offsetGabor Juhos2024-04-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The offset of the CONFIG_CTL_U register defined for the Stromer PLL is wrong. It is not aligned on a 4 bytes boundary which might causes errors in regmap operations. Maybe the intention behind of using the 0xff value was to indicate that the register is not implemented in the PLL, but this is not verified anywhere in the code. Moreover, this value is not used even in other register offset arrays despite that those PLLs also have unimplemented registers. Additionally, on the Stromer PLLs the current code only touches the CONFIG_CTL_U register if the result of pll_has_64bit_config() is true which condition is not affected by the change. Due to the reasons above, simply remove the CONFIG_CTL_U entry from the Stromer specific array. Fixes: e47a4f55f240 ("clk: qcom: clk-alpha-pll: Add support for Stromer PLLs") Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240311-alpha-pll-stromer-cleanup-v1-1-f7c0c5607cca@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: gcc-sm8150: De-register gcc_cpuss_ahb_clk_srcSatya Priya Kakitapalli2024-04-211-61/+0
| | | | | | | | | | | | | | | | | | De-register the gcc_cpuss_ahb_clk_src and its branch clocks as there is no rate setting happening on them. Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com> Link: https://lore.kernel.org/r/20240213-gcc-ao-support-v2-1-fd2127e8d8f4@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: rpm: Remove an unused field in struct rpm_ccChristophe JAILLET2024-04-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | In "struct rpm_cc", the 'rpm' field is unused. Remove it. Found with cppcheck, unusedStructMember. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Link: https://lore.kernel.org/r/9f92330c717e6f2dab27b1307565ffb108c304a7.1713017032.git.christophe.jaillet@wanadoo.fr Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: clk-alpha-pll: Skip reconfiguring the running Lucid EvoAbel Vesa2024-04-201-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | The PLL0 is configured by the bootloader and is the parent of the mdp_clk_src. The Trion implementation of the configure function is already skipping this step if the PLL is enabled, so lets extend the same behavior to Lucid Evo variant. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20240418-clk-qcom-lucid-evo-skip-configuring-enabled-v1-1-caede5f1c7a3@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* | clk: qcom: fix module autoloadingKrzysztof Kozlowski2024-04-102-0/+2
|/ | | | | | | | | | | | | Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table. Clocks are considered core components, so usually they are built-in, however these can be built and used as modules on some generic kernel. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20240410155356.224098-1-krzk@kernel.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
* Linux 6.9-rc1v6.9-rc1Linus Torvalds2024-03-241-2/+2
|
* Merge tag 'efi-fixes-for-v6.9-2' of ↵Linus Torvalds2024-03-244-2/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Fix logic that is supposed to prevent placement of the kernel image below LOAD_PHYSICAL_ADDR - Use the firmware stack in the EFI stub when running in mixed mode - Clear BSS only once when using mixed mode - Check efi.get_variable() function pointer for NULL before trying to call it * tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: fix panic in kdump kernel x86/efistub: Don't clear BSS twice in mixed mode x86/efistub: Call mixed mode boot services on the firmware's stack efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
| * efi: fix panic in kdump kernelOleksandr Tymoshenko2024-03-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Check if get_next_variable() is actually valid pointer before calling it. In kdump kernel this method is set to NULL that causes panic during the kexec-ed kernel boot. Tested with QEMU and OVMF firmware. Fixes: bad267f9e18f ("efi: verify that variable services are supported") Signed-off-by: Oleksandr Tymoshenko <ovt@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
| * x86/efistub: Don't clear BSS twice in mixed modeArd Biesheuvel2024-03-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clearing BSS should only be done once, at the very beginning. efi_pe_entry() is the entrypoint from the firmware, which may not clear BSS and so it is done explicitly. However, efi_pe_entry() is also used as an entrypoint by the mixed mode startup code, in which case BSS will already have been cleared, and doing it again at this point will corrupt global variables holding the firmware's GDT/IDT and segment selectors. So make the memset() conditional on whether the EFI stub is running in native mode. Fixes: b3810c5a2cc4a666 ("x86/efistub: Clear decompressor BSS in native EFI entrypoint") Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
| * x86/efistub: Call mixed mode boot services on the firmware's stackArd Biesheuvel2024-03-241-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally, the EFI stub calls into the EFI boot services using the stack that was live when the stub was entered. According to the UEFI spec, this stack needs to be at least 128k in size - this might seem large but all asynchronous processing and event handling in EFI runs from the same stack and so quite a lot of space may be used in practice. In mixed mode, the situation is a bit different: the bootloader calls the 32-bit EFI stub entry point, which calls the decompressor's 32-bit entry point, where the boot stack is set up, using a fixed allocation of 16k. This stack is still in use when the EFI stub is started in 64-bit mode, and so all calls back into the EFI firmware will be using the decompressor's limited boot stack. Due to the placement of the boot stack right after the boot heap, any stack overruns have gone unnoticed. However, commit 5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code") moved the definition of the boot heap into C code, and now the boot stack is placed right at the base of BSS, where any overruns will corrupt the end of the .data section. While it would be possible to work around this by increasing the size of the boot stack, doing so would affect all x86 systems, and mixed mode systems are a tiny (and shrinking) fraction of the x86 installed base. So instead, record the firmware stack pointer value when entering from the 32-bit firmware, and switch to this stack every time a EFI boot service call is made. Cc: <stable@kernel.org> # v6.1+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
| * efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or ↵KONDO KAZUMA(近藤 和真)2024-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | higher address Following warning is sometimes observed while booting my servers: [ 3.594838] DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations [ 3.602918] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-1 ... [ 3.851862] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation If 'nokaslr' boot option is set, the warning always happens. On x86, ZONE_DMA is small zone at the first 16MB of physical address space. When this problem happens, most of that space seems to be used by decompressed kernel. Thereby, there is not enough space at DMA_ZONE to meet the request of DMA pool allocation. The commit 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR") tried to fix this problem by introducing lower bound of allocation. But the fix is not complete. efi_random_alloc() allocates pages by following steps. 1. Count total available slots ('total_slots') 2. Select a slot ('target_slot') to allocate randomly 3. Calculate a starting address ('target') to be included target_slot 4. Allocate pages, which starting address is 'target' In step 1, 'alloc_min' is used to offset the starting address of memory chunk. But in step 3 'alloc_min' is not considered at all. As the result, 'target' can be miscalculated and become lower than 'alloc_min'. When KASLR is disabled, 'target_slot' is always 0 and the problem happens everytime if the EFI memory map of the system meets the condition. Fix this problem by calculating 'target' considering 'alloc_min'. Cc: linux-efi@vger.kernel.org Cc: Tom Englund <tomenglund26@gmail.com> Cc: linux-kernel@vger.kernel.org Fixes: 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR") Signed-off-by: Kazuma Kondo <kazuma-kondo@nec.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
* | Merge tag 'x86-urgent-2024-03-24' of ↵Linus Torvalds2024-03-2415-89/+80
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Ensure that the encryption mask at boot is properly propagated on 5-level page tables, otherwise the PGD entry is incorrectly set to non-encrypted, which causes system crashes during boot. - Undo the deferred 5-level page table setup as it cannot work with memory encryption enabled. - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset to the default value but the cached variable is not, so subsequent comparisons might yield the wrong result and as a consequence the result prevents updating the MSR. - Register the local APIC address only once in the MPPARSE enumeration to prevent triggering the related WARN_ONs() in the APIC and topology code. - Handle the case where no APIC is found gracefully by registering a fake APIC in the topology code. That makes all related topology functions work correctly and does not affect the actual APIC driver code at all. - Don't evaluate logical IDs during early boot as the local APIC IDs are not yet enumerated and the invoked function returns an error code. Nothing requires the logical IDs before the final CPUID enumeration takes place, which happens after the enumeration. - Cure the fallout of the per CPU rework on UP which misplaced the copying of boot_cpu_data to per CPU data so that the final update to boot_cpu_data got lost which caused inconsistent state and boot crashes. - Use copy_from_kernel_nofault() in the kprobes setup as there is no guarantee that the address can be safely accessed. - Reorder struct members in struct saved_context to work around another kmemleak false positive - Remove the buggy code which tries to update the E820 kexec table for setup_data as that is never passed to the kexec kernel. - Update the resource control documentation to use the proper units. - Fix a Kconfig warning observed with tinyconfig * tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Move 5-level paging global variable assignments back x86/boot/64: Apply encryption mask to 5-level pagetable update x86/cpu: Add model number for another Intel Arrow Lake mobile processor x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD Documentation/x86: Document that resctrl bandwidth control units are MiB x86/mpparse: Register APIC address only once x86/topology: Handle the !APIC case gracefully x86/topology: Don't evaluate logical IDs during early boot x86/cpu: Ensure that CPU info updates are propagated on UP kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address x86/pm: Work around false positive kmemleak report in msr_build_context() x86/kexec: Do not update E820 kexec table for setup_data x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
| * | x86/boot/64: Move 5-level paging global variable assignments backTom Lendacky2024-03-241-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") moved assignment of 5-level global variables to later in the boot in order to avoid having to use RIP relative addressing in order to set them. However, when running with 5-level paging and SME active (mem_encrypt=on), the variables are needed as part of the page table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(), etc.). Since the variables haven't been set, the page table manipulation is done as if 4-level paging is active, causing the system to crash on boot. While only a subset of the assignments that were moved need to be set early, move all of the assignments back into check_la57_support() so that these assignments aren't spread between two locations. Instead of just reverting the fix, this uses the new RIP_REL_REF() macro when assigning the variables. Fixes: 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
| * | x86/boot/64: Apply encryption mask to 5-level pagetable updateTom Lendacky2024-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running with 5-level page tables, the kernel mapping PGD entry is updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC, which, when SME is active (mem_encrypt=on), results in a page table entry without the encryption mask set, causing the system to crash on boot. Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so that the encryption mask is set for the PGD entry. Fixes: 533568e06b15 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
| * | x86/cpu: Add model number for another Intel Arrow Lake mobile processorTony Luck2024-03-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This one is the regular laptop CPU. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
| * | x86/fpu: Keep xfd_state in sync with MSR_IA32_XFDAdamos Ttofari2024-03-242-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 672365477ae8 ("x86/fpu: Update XFD state where required") and commit 8bf26758ca96 ("x86/fpu: Add XFD state to fpstate") introduced a per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in order to avoid unnecessary writes to the MSR. On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which wipes out any stale state. But the per CPU cached xfd value is not reset, which brings them out of sync. As a consequence a subsequent xfd_update_state() might fail to update the MSR which in turn can result in XRSTOR raising a #NM in kernel space, which crashes the kernel. To fix this, introduce xfd_set_state() to write xfd_state together with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD. Fixes: 672365477ae8 ("x86/fpu: Update XFD state where required") Signed-off-by: Adamos Ttofari <attofari@amazon.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
| * | Documentation/x86: Document that resctrl bandwidth control units are MiBTony Luck2024-03-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The memory bandwidth software controller uses 2^20 units rather than 10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M" Linux define for 0x00100000. Update the documentation to use MiB when describing this feature. It's too late to fix the mount option "mba_MBps" as that is now an established user interface. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240322182016.196544-1-tony.luck@intel.com
| * | x86/mpparse: Register APIC address only onceThomas Gleixner2024-03-231-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The APIC address is registered twice. First during the early detection and afterwards when actually scanning the table for APIC IDs. The APIC and topology core warn about the second attempt. Restrict it to the early detection call. Fixes: 81287ad65da5 ("x86/apic: Sanitize APIC address setup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
| * | x86/topology: Handle the !APIC case gracefullyThomas Gleixner2024-03-231-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is no local APIC enumerated and registered then the topology bitmaps are empty. Therefore, topology_init_possible_cpus() will die with a division by zero exception. Prevent this by registering a fake APIC id to populate the topology bitmap. This also allows to use all topology query interfaces unconditionally. It does not affect the actual APIC code because either the local APIC address was not registered or no local APIC could be detected. Fixes: f1f758a80516 ("x86/topology: Add a mechanism to track topology via APIC IDs") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
| * | x86/topology: Don't evaluate logical IDs during early bootThomas Gleixner2024-03-231-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The local APICs have not yet been enumerated so the logical ID evaluation from the topology bitmaps does not work and would return an error code. Skip the evaluation during the early boot CPUID evaluation and only apply it on the final run. Fixes: 380414be78bf ("x86/cpu/topology: Use topology logical mapping mechanism") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
| * | x86/cpu: Ensure that CPU info updates are propagated on UPThomas Gleixner2024-03-233-37/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The boot sequence evaluates CPUID information twice: 1) During early boot 2) When finalizing the early setup right before mitigations are selected and alternatives are patched. In both cases the evaluation is stored in boot_cpu_data, but on UP the copying of boot_cpu_data to the per CPU info of the boot CPU happens between #1 and #2. So any update which happens in #2 is never propagated to the per CPU info instance. Consolidate the whole logic and copy boot_cpu_data right before applying alternatives as that's the point where boot_cpu_data is in it's final state and not supposed to change anymore. This also removes the voodoo mb() from smp_prepare_cpus_common() which had absolutely no purpose. Fixes: 71eb4893cfaf ("x86/percpu: Cure per CPU madness on UP") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
| * | kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe addressMasami Hiramatsu (Google)2024-03-221-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Read from an unsafe address with copy_from_kernel_nofault() in arch_adjust_kprobe_addr() because this function is used before checking the address is in text or not. Syzcaller bot found a bug and reported the case if user specifies inaccessible data area, arch_adjust_kprobe_addr() will cause a kernel panic. [ mingo: Clarified the comment. ] Fixes: cc66bb914578 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes") Reported-by: Qiang Zhang <zzqq0103.hey@gmail.com> Tested-by: Jinghao Jia <jinghao7@illinois.edu> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/171042945004.154897.2221804961882915806.stgit@devnote2
| * | x86/pm: Work around false positive kmemleak report in msr_build_context()Anton Altaparmakov2024-03-221-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") kmemleak reports this issue: unreferenced object 0xf68241e0 (size 32): comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s) hex dump (first 32 bytes): 00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00 ....)........... 00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc .B.............. backtrace: [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260 [<ea65e13b>] __kmalloc+0x54/0x160 [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100 [<46635aff>] pm_check_save_msr+0x63/0x80 [<6b6bb938>] do_one_initcall+0x41/0x1f0 [<3f3add60>] kernel_init_freeable+0x199/0x1e8 [<3b538fde>] kernel_init+0x1a/0x110 [<938ae2b2>] ret_from_fork+0x1c/0x28 Which is a false positive. Reproducer: - Run rsync of whole kernel tree (multiple times if needed). - start a kmemleak scan - Note this is just an example: a lot of our internal tests hit these. The root cause is similar to the fix in: b0b592cf0836 x86/pm: Fix false positive kmemleak report in msr_build_context() ie. the alignment within the packed struct saved_context which has everything unaligned as there is only "u16 gs;" at start of struct where in the past there were four u16 there thus aligning everything afterwards. The issue is with the fact that Kmemleak only searches for pointers that are aligned (see how pointers are scanned in kmemleak.c) so when the struct members are not aligned it doesn't see them. Testing: We run a lot of tests with our CI, and after applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above report. From a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, which is quite a lot. With this fix applied we get zero kmemleak related failures. Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240314142656.17699-1-anton@tuxera.com
| * | x86/kexec: Do not update E820 kexec table for setup_dataDave Young2024-03-221-16/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | crashkernel reservation failed on a Thinkpad t440s laptop recently. Actually the memblock reservation succeeded, but later insert_resource() failed. Test steps: kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */ kexec reboot -> dmesg|grep "crashkernel reserved"; crashkernel memory range like below reserved successfully: 0x00000000d0000000 - 0x00000000da000000 But no such "Crash kernel" region in /proc/iomem The background story: Currently the E820 code reserves setup_data regions for both the current kernel and the kexec kernel, and it inserts them into the resources list. Before the kexec kernel reboots nobody passes the old setup_data, and kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed. Thus the old setup data memory is not used at all. Due to old kernel updates the kexec e820 table as well so kexec kernel sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data regions are inserted into resources list in the kexec kernel by e820__reserve_resources(). Note, due to no setup_data is passed in for those old regions they are not early reserved (by function early_reserve_memory), and the crashkernel memblock reservation will just treat them as usable memory and it could reserve the crashkernel region which overlaps with the old setup_data regions. And just like the bug I noticed here, kdump insert_resource failed because e820__reserve_resources has added the overlapped chunks in /proc/iomem already. Finally, looking at the code, the old setup_data regions are not used at all as no setup_data is passed in by the kexec boot loader. Although something like SETUP_PCI etc could be needed, kexec should pass the info as new setup_data so that kexec kernel can take care of them. This should be taken care of in other separate patches if needed. Thus drop the useless buggy code here. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Eric DeVolder <eric.devolder@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/Zf0T3HCG-790K-pZ@darkstar.users.ipa.redhat.com
| * | x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'Masahiro Yamada2024-03-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kconfig emits a warning for the following command: $ make ARCH=x86_64 tinyconfig ... .config:1380:warning: override: UNWINDER_GUESS changes choice state When X86_64=y, the unwinder is exclusively selected from the following three options: - UNWINDER_ORC - UNWINDER_FRAME_POINTER - UNWINDER_GUESS However, arch/x86/configs/tiny.config only specifies the values of the last two. UNWINDER_ORC must be explicitly disabled. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240320154313.612342-1-masahiroy@kernel.org
* | | Merge tag 'sched-urgent-2024-03-24' of ↵Linus Torvalds2024-03-241-0/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler doc clarification from Thomas Gleixner: "A single update for the documentation of the base_slice_ns tunable to clarify that any value which is less than the tick slice has no effect because the scheduler tick is not guaranteed to happen within the set time slice" * tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
| * | | sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relationMukesh Kumar Chaurasiya2024-03-211-0/+3
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tunable base_slice_ns is dependent on CONFIG_HZ (i.e. TICK_NSEC) for any significant performance improvement. The reason being the scheduler tick is not frequent enough to force preemption when base_slice expires in case of: base_slice_ns < TICK_NSEC The below data is of stress-ng: Number of CPU: 1 Stressor threads: 4 Time: 30sec On CONFIG_HZ=1000 | base_slice | avg-run (msec) | context-switches | | ---------- | -------------- | ---------------- | | 3ms | 2.914 | 10342 | | 6ms | 4.857 | 6196 | | 9ms | 6.754 | 4482 | | 12ms | 7.872 | 3802 | | 22ms | 11.294 | 2710 | | 32ms | 13.425 | 2284 | On CONFIG_HZ=100 | base_slice | avg-run (msec) | context-switches | | ---------- | -------------- | ---------------- | | 3ms | 9.144 | 3337 | | 6ms | 9.113 | 3301 | | 9ms | 8.991 | 3315 | | 12ms | 12.935 | 2328 | | 22ms | 16.031 | 1915 | | 32ms | 18.608 | 1622 | base_slice: the value of base_slice in ms avg-run (msec): average time of the stressor threads got on cpu before it got preempted context-switches: number of context switches for the stress-ng process Signed-off-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240320173815.927637-2-mchauras@linux.ibm.com
* | | Merge tag 'dma-mapping-6.9-2024-03-24' of ↵Linus Torvalds2024-03-242-12/+42
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: "This has a set of swiotlb alignment fixes for sometimes very long standing bugs from Will. We've been discussion them for a while and they should be solid now" * tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE iommu/dma: Force swiotlb_max_mapping_size on an untrusted device swiotlb: Fix alignment checks when both allocation and DMA masks are present swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc() swiotlb: Enforce page alignment in swiotlb_alloc() swiotlb: Fix double-allocation of slots due to broken alignment handling
| * | | swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZEWill Deacon2024-03-131-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For swiotlb allocations >= PAGE_SIZE, the slab search historically adjusted the stride to avoid checking unaligned slots. This had the side-effect of aligning large mapping requests to PAGE_SIZE, but that was broken by 0eee5ae10256 ("swiotlb: fix slot alignment checks"). Since this alignment could be relied upon drivers, reinstate PAGE_SIZE alignment for swiotlb mappings >= PAGE_SIZE. Reported-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | iommu/dma: Force swiotlb_max_mapping_size on an untrusted deviceNicolin Chen2024-03-131-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The swiotlb does not support a mapping size > swiotlb_max_mapping_size(). On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that an NVME device can map a size between 300KB~512KB, which certainly failed the swiotlb mappings, though the default pool of swiotlb has many slots: systemd[1]: Started Journal Service. => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots) note: journal-offline[392] exited with irqs disabled note: journal-offline[392] exited with preempt_count 1 Call trace: [ 3.099918] swiotlb_tbl_map_single+0x214/0x240 [ 3.099921] iommu_dma_map_page+0x218/0x328 [ 3.099928] dma_map_page_attrs+0x2e8/0x3a0 [ 3.101985] nvme_prep_rq.part.0+0x408/0x878 [nvme] [ 3.102308] nvme_queue_rqs+0xc0/0x300 [nvme] [ 3.102313] blk_mq_flush_plug_list.part.0+0x57c/0x600 [ 3.102321] blk_add_rq_to_plug+0x180/0x2a0 [ 3.102323] blk_mq_submit_bio+0x4c8/0x6b8 [ 3.103463] __submit_bio+0x44/0x220 [ 3.103468] submit_bio_noacct_nocheck+0x2b8/0x360 [ 3.103470] submit_bio_noacct+0x180/0x6c8 [ 3.103471] submit_bio+0x34/0x130 [ 3.103473] ext4_bio_write_folio+0x5a4/0x8c8 [ 3.104766] mpage_submit_folio+0xa0/0x100 [ 3.104769] mpage_map_and_submit_buffers+0x1a4/0x400 [ 3.104771] ext4_do_writepages+0x6a0/0xd78 [ 3.105615] ext4_writepages+0x80/0x118 [ 3.105616] do_writepages+0x90/0x1e8 [ 3.105619] filemap_fdatawrite_wbc+0x94/0xe0 [ 3.105622] __filemap_fdatawrite_range+0x68/0xb8 [ 3.106656] file_write_and_wait_range+0x84/0x120 [ 3.106658] ext4_sync_file+0x7c/0x4c0 [ 3.106660] vfs_fsync_range+0x3c/0xa8 [ 3.106663] do_fsync+0x44/0xc0 Since untrusted devices might go down the swiotlb pathway with dma-iommu, these devices should not map a size larger than swiotlb_max_mapping_size. To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from the iommu_dma_opt_mapping_size(). Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Link: https://lore.kernel.org/r/ee51a3a5c32cf885b18f6416171802669f4a718a.1707851466.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> [will: Drop redundant is_swiotlb_active(dev) check] Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | swiotlb: Fix alignment checks when both allocation and DMA masks are presentWill Deacon2024-03-131-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nicolin reports that swiotlb buffer allocations fail for an NVME device behind an IOMMU using 64KiB pages. This is because we end up with a minimum allocation alignment of 64KiB (for the IOMMU to map the buffer safely) but a minimum DMA alignment mask corresponding to a 4KiB NVME page (i.e. preserving the 4KiB page offset from the original allocation). If the original address is not 4KiB-aligned, the allocation will fail because swiotlb_search_pool_area() erroneously compares these unmasked bits with the 64KiB-aligned candidate allocation. Tweak swiotlb_search_pool_area() so that the DMA alignment mask is reduced based on the required alignment of the allocation. Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Link: https://lore.kernel.org/r/cover.1707851466.git.nicolinc@nvidia.com Reported-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>