PCI/AER: Report non-fatal errors only to the affected endpoint

[ Upstream commit 86acc790717fb60fb51ea3095084e331d8711c74 ] Previously, if an non-fatal error was reported by an endpoint, we called report_error_detected() for the endpoint, every sibling on the bus, and their descendents. If any of them did not implement the .error_detected() method, do_recovery() failed, leaving all these devices unrecovered. For example, the system described in the bugzilla below has two devices: 0000:74:02.0 [19e5:a230] SAS controller, driver has .error_detected() 0000:74:03.0 [19e5:a235] SATA controller, driver lacks .error_detected() When a device such as 74:02.0 reported a non-fatal error, do_recovery() failed because 74:03.0 lacked an .error_detected() method. But per PCIe r3.1, sec 6.2.2.2.2, such an error does not compromise the Link and does not affect 74:03.0: Non-fatal errors are uncorrectable errors which cause a particular transaction to be unreliable but the Link is otherwise fully functional. Isolating Non-fatal from Fatal errors provides Requester/Receiver logic in a device or system management software the opportunity to recover from the error without resetting the components on the Link and disturbing other transactions in progress. Devices not associated with the transaction in error are not impacted by the error. Report non-fatal errors only to the endpoint that reported them. We really want to check for AER_NONFATAL here, but the current code structure doesn't allow that. Looking for pci_channel_io_normal is the best we can do now. Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055 Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver") Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Gabriele Paoloni <gabriele.paoloni@huawei.com> 2017-09-28 15:33:05 +0100
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2017-12-25 14:20:09 +0100
commit: ca6f17da0b134bbc71d143dc5e059f20809c11dd (patch)
tree: d41f1902b2277aa03b27493d392649cea5239d59 /drivers/pci
parent: 6d117334cea2f10bec8023520e50fc3de49631b7 (diff)
download: linux-stable-ca6f17da0b134bbc71d143dc5e059f20809c11dd.tar.gz
linux-stable-ca6f17da0b134bbc71d143dc5e059f20809c11dd.tar.bz2
linux-stable-ca6f17da0b134bbc71d143dc5e059f20809c11dd.zip
1 files changed, 8 insertions, 1 deletions
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index b60a325234c5..cca4b4789ac4 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -360,7 +360,14 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
 		 * If the error is reported by an end point, we think this
 		 * error is related to the upstream link of the end point.
 		 */
-		pci_walk_bus(dev->bus, cb, &result_data);
+		if (state == pci_channel_io_normal)
+			/*
+			 * the error is non fatal so the bus is ok, just invoke
+			 * the callback for the function that logged the error.
+			 */
+			cb(dev, &result_data);
+		else
+			pci_walk_bus(dev->bus, cb, &result_data);
 	}
 
 	return result_data.result;
author	Gabriele Paoloni <gabriele.paoloni@huawei.com>	2017-09-28 15:33:05 +0100
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:20:09 +0100
commit	ca6f17da0b134bbc71d143dc5e059f20809c11dd (patch)
tree	d41f1902b2277aa03b27493d392649cea5239d59 /drivers/pci
parent	6d117334cea2f10bec8023520e50fc3de49631b7 (diff)
download	linux-stable-ca6f17da0b134bbc71d143dc5e059f20809c11dd.tar.gz linux-stable-ca6f17da0b134bbc71d143dc5e059f20809c11dd.tar.bz2 linux-stable-ca6f17da0b134bbc71d143dc5e059f20809c11dd.zip