From 8037ac08c2bbb3186f83a5a924f52d1048dbaec5 Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Fri, 9 Aug 2024 14:24:46 +0100 Subject: PCI: Clear the LBMS bit after a link retrain MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The LBMS bit, where implemented, is set by hardware either in response to the completion of retraining caused by writing 1 to the Retrain Link bit or whenever hardware has changed the link speed or width in attempt to correct unreliable link operation. It is never cleared by hardware other than by software writing 1 to the bit position in the Link Status register and we never do such a write. We currently have two places, namely apply_bad_link_workaround() and pcie_failed_link_retrain() in drivers/pci/controller/dwc/pcie-tegra194.c and drivers/pci/quirks.c respectively where we check the state of the LBMS bit and neither is interested in the state of the bit resulting from the completion of retraining, both check for a link fault. And in particular pcie_failed_link_retrain() causes issues consequently, by trying to retrain a link where there's no downstream device anymore and the state of 1 in the LBMS bit has been retained from when there was a device downstream that has since been removed. Clear the LBMS bit then at the conclusion of pcie_retrain_link(), so that we have a single place that controls it and that our code can track link speed or width changes resulting from unreliable link operation. Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091133140.61955@angie.orcam.me.uk Reported-by: Matthew W Carlis Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/ Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/ Signed-off-by: Maciej W. Rozycki Signed-off-by: Bjorn Helgaas Signed-off-by: Krzysztof Wilczyński Cc: # v6.5+ --- drivers/pci/pci.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'drivers/pci/pci.c') diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e3a49f66982d..5618a8927aa9 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4717,7 +4717,15 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt) pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL); } - return pcie_wait_for_link_status(pdev, use_lt, !use_lt); + rc = pcie_wait_for_link_status(pdev, use_lt, !use_lt); + + /* + * Clear LBMS after a manual retrain so that the bit can be used + * to track link speed or width changes made by hardware itself + * in attempt to correct unreliable link operation. + */ + pcie_capability_write_word(pdev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS); + return rc; } /** -- cgit v1.2.3 From 59100eb248c0b15585affa546c7f6834b30eb5a4 Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Fri, 9 Aug 2024 14:25:02 +0100 Subject: PCI: Use an error code with PCIe failed link retraining MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Given how the call place in pcie_wait_for_link_delay() got structured now, and that pcie_retrain_link() returns a potentially useful error code, convert pcie_failed_link_retrain() to return an error code rather than a boolean status, fixing handling at the call site mentioned. Update the other call site accordingly. Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'") Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091156530.61955@angie.orcam.me.uk Reported-by: Ilpo Järvinen Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/ Signed-off-by: Maciej W. Rozycki Signed-off-by: Bjorn Helgaas Signed-off-by: Krzysztof Wilczyński Reviewed-by: Ilpo Järvinen Cc: # v6.5+ --- drivers/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/pci/pci.c') diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 5618a8927aa9..d46652760b11 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1324,7 +1324,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) if (delay > PCI_RESET_WAIT) { if (retrain) { retrain = false; - if (pcie_failed_link_retrain(bridge)) { + if (pcie_failed_link_retrain(bridge) == 0) { delay = 1; continue; } -- cgit v1.2.3