summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/platforms
diff options
context:
space:
mode:
authorGavin Shan <gwshan@linux.vnet.ibm.com>2014-06-04 17:31:52 +1000
committerBenjamin Herrenschmidt <benh@kernel.crashing.org>2014-06-11 17:04:33 +1000
commit5c7a35e3e25232aef8d7aee484436f8cbe3b9b94 (patch)
tree7bf01ffd5b9e4057089c2c2383fbcf492694fca9 /arch/powerpc/platforms
parent6e0fdf9af216887e0032c19d276889aad41cad00 (diff)
downloadlinux-5c7a35e3e25232aef8d7aee484436f8cbe3b9b94.tar.gz
linux-5c7a35e3e25232aef8d7aee484436f8cbe3b9b94.tar.bz2
linux-5c7a35e3e25232aef8d7aee484436f8cbe3b9b94.zip
powerpc/powernv: Fix killed EEH event
On PowerNV platform, EEH errors are reported by IO accessors or poller driven by interrupt. After the PE is isolated, we won't produce EEH event for the PE. The current implementation has possibility of EEH event lost in this way: The interrupt handler queues one "special" event, which drives the poller. EEH thread doesn't pick the special event yet. IO accessors kicks in, the frozen PE is marked as "isolated" and EEH event is queued to the list. EEH thread runs because of special event and purge all existing EEH events. However, we never produce an other EEH event for the frozen PE. Eventually, the PE is marked as "isolated" and we don't have EEH event to recover it. The patch fixes the issue to keep EEH events for PEs that have been marked as "isolated" with the help of additional "force" help to eeh_remove_event(). Reported-by: Rolf Brudeseth <rolfb@us.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Diffstat (limited to 'arch/powerpc/platforms')
-rw-r--r--arch/powerpc/platforms/powernv/eeh-ioda.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 5711f6f1fda6..9c002099f875 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -717,7 +717,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
* And we should keep the cached OPAL notifier event sychronized
* between the kernel and firmware.
*/
- eeh_remove_event(NULL);
+ eeh_remove_event(NULL, false);
opal_notifier_update_evt(OPAL_EVENT_PCI_ERROR, 0x0ul);
list_for_each_entry(hose, &hose_list, list_node) {