summaryrefslogtreecommitdiffstats
path: root/drivers/pci/hotplug/pciehp_hpc.c
Commit message (Collapse)AuthorAgeFilesLines
* PCI: pciehp: Fix MSI interrupt raceStuart Hayes2020-03-311-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this commit, a PCIe hotplug port can stop generating interrupts on hotplug events, so device adds and removals will not be seen: The pciehp interrupt handler pciehp_isr() reads the Slot Status register and then writes back to it to clear the bits that caused the interrupt. If a different interrupt event bit gets set between the read and the write, pciehp_isr() returns without having cleared all of the interrupt event bits. If this happens when the MSI isn't masked (which by default it isn't in handle_edge_irq(), and which it will never be when MSI per-vector masking is not supported), we won't get any more hotplug interrupts from that device. That is expected behavior, according to the PCIe Base Spec r5.0, section 6.7.3.4, "Software Notification of Hot-Plug Events". Because the Presence Detect Changed and Data Link Layer State Changed event bits can both get set at nearly the same time when a device is added or removed, this is more likely to happen than it might seem. The issue was found (and can be reproduced rather easily) by connecting and disconnecting an NVMe storage device on at least one system model where the NVMe devices were being connected to an AMD PCIe port (PCI device 0x1022/0x1483). Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot Status register immediately after writing to it, until it sees that all of the event status bits have been cleared. [lukas: drop loop count limitation, write "events" instead of "status", don't loop back in INTx and poll modes, tweak code comment & commit msg] Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Joerg Roedel <jroedel@suse.de>
* PCI: pciehp: Fix indefinite wait on sysfs requestsLukas Wunner2020-03-311-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David Hoyer reports that powering pciehp slots up or down via sysfs may hang: The call to wait_event() in pciehp_sysfs_enable_slot() and _disable_slot() does not return because ctrl->ist_running remains true. This flag, which was introduced by commit 157c1062fcd8 ("PCI: pciehp: Avoid returning prematurely from sysfs requests"), signifies that the IRQ thread pciehp_ist() is running. It is set to true at the top of pciehp_ist() and reset to false at the end. However there are two additional return statements in pciehp_ist() before which the commit neglected to reset the flag to false and wake up waiters for the flag. That omission opens up the following race when powering up the slot: * pciehp_ist() runs because a PCI_EXP_SLTSTA_PDC event was requested by pciehp_sysfs_enable_slot() * pciehp_ist() turns on slot power via the following call stack: pciehp_handle_presence_or_link_change() -> pciehp_enable_slot() -> __pciehp_enable_slot() -> board_added() -> pciehp_power_on_slot() * after slot power is turned on, the link comes up, resulting in a PCI_EXP_SLTSTA_DLLSC event * the IRQ handler pciehp_isr() stores the event in ctrl->pending_events and returns IRQ_WAKE_THREAD * the IRQ thread is already woken (it's bringing up the slot), but the genirq code remembers to re-run the IRQ thread after it has finished (such that it can deal with the new event) by setting IRQTF_RUNTHREAD via __handle_irq_event_percpu() -> __irq_wake_thread() * the IRQ thread removes PCI_EXP_SLTSTA_DLLSC from ctrl->pending_events via board_added() -> pciehp_check_link_status() in order to deal with presence and link flaps per commit 6c35a1ac3da6 ("PCI: pciehp: Tolerate initially unstable link") * after pciehp_ist() has successfully brought up the slot, it resets ctrl->ist_running to false and wakes up the sysfs requester * the genirq code re-runs pciehp_ist(), which sets ctrl->ist_running to true but then returns with IRQ_NONE because ctrl->pending_events is empty * pciehp_sysfs_enable_slot() is finally woken but notices that ctrl->ist_running is true, hence continues waiting The only way to get the hung task going again is to trigger a hotplug event which brings down the slot, e.g. by yanking out the card. The same race exists when powering down the slot because remove_board() likewise clears link or presence changes in ctrl->pending_events per commit 3943af9d01e9 ("PCI: pciehp: Ignore Link State Changes after powering off a slot") and thereby may cause a re-run of pciehp_ist() which returns with IRQ_NONE without resetting ctrl->ist_running to false. Fix by adding a goto label before the teardown steps at the end of pciehp_ist() and jumping to that label from the two return statements which currently neglect to reset the ctrl->ist_running flag. Fixes: 157c1062fcd8 ("PCI: pciehp: Avoid returning prematurely from sysfs requests") Link: https://lore.kernel.org/r/cca1effa488065cb055120aa01b65719094bdcb5.1584530321.git.lukas@wunner.de Reported-by: David Hoyer <David.Hoyer@netapp.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Cc: stable@vger.kernel.org # v4.19+
* PCI: pciehp: Add DMI table for in-band presence detection disabledStuart Hayes2020-02-201-0/+22
| | | | | | | | | | | | | | | | | Some systems have in-band presence detection disabled for hot-plug PCI slots but do not report this in the slot capabilities 2 (SLTCAP2) register. On these systems, presence detect can become active well after the link is reported to be active, which can cause the slots to be disabled after a device is connected. Add a DMI table to flag these systems as having in-band presence detect disabled. Link: https://lore.kernel.org/r/20191025190047.38130-4-stuart.w.hayes@gmail.com Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
* PCI: pciehp: Wait for PDS if in-band presence is disabledAlexandru Gagniuc2020-02-201-0/+19
| | | | | | | | | | | | | | | | | | | | | When in-band presence detect is disabled, PDS may come up at any time or not at all. PDS being low may indicate that the card is still mating, and we could expect contact bounce to bring down the link as well. It is reasonable to assume that most cards will mate in a hotplug slot in about a second. Thus, when we know PDS only reflects out-of-band presence detect, it's worthwhile to wait the extra second or so to make sure the card is properly mated before loading the driver and to prevent the hotplug code from disabling a device if the presence detect change goes active after the device is enabled. Link: https://lore.kernel.org/r/20191025190047.38130-3-stuart.w.hayes@gmail.com [bhelgaas: use ctrl_info() instead of pci_info()] Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
* PCI: pciehp: Disable in-band presence detect when possibleAlexandru Gagniuc2020-02-201-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The presence detect state (PDS) is normally a logical OR of in-band and out-of-band (OOB) presence detect. As of PCIe 4.0, there is the option to disable in-band presence so that the PDS bit always reflects the state of the out-of-band presence. The recommendation of the PCIe spec is to disable in-band presence whenever supported (PCIe r5.0, appendix I implementation note): Due to architectural issues, the in-band (Physical-Layer-based) portion of the PD mechanism is deprecated for use with async hot-plug. One issue is that in-band PD as architected does not detect adapter removal during certain LTSSM states, notably the L1 and Disabled States. Another issue is that when both in-band and OOB PD are being used together, the Presence Detect State bit and its associated interrupt mechanism always reflect the logical OR of the inband and OOB PD states, and with some hot-plug hardware configurations, it is important for software to detect and respond to in-band and OOB PD events independently. If OOB PD is being used and the associated DSP supports In-Band PD Disable, it is recommended that the In-Band PD Disable bit be Set, and the Presence Detect State bit and its associated interrupt mechanism be used exclusively for OOB PD. As a substitute for in-band PD with async hot-plug, the reference model uses either the DPC or the DLL Link Active mechanism. Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com [bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD value (suggested by Lukas)] Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
* PCI: pciehp: Prevent deadlock on disconnectMika Westerberg2019-11-121-11/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This addresses deadlocks in these common cases in hierarchies containing two switches: - All involved ports are runtime suspended and they are unplugged. This can happen easily if the drivers involved automatically enable runtime PM (xHCI for example does that). - System is suspended (e.g., closing the lid on a laptop) with a dock + something else connected, and the dock is unplugged while suspended. These cases lead to the following deadlock: INFO: task irq/126-pciehp:198 blocked for more than 120 seconds. irq/126-pciehp D 0 198 2 0x80000000 Call Trace: schedule+0x2c/0x80 schedule_timeout+0x246/0x350 wait_for_completion+0xb7/0x140 kthread_stop+0x49/0x110 free_irq+0x32/0x70 pcie_shutdown_notification+0x2f/0x50 pciehp_remove+0x27/0x50 pcie_port_remove_service+0x36/0x50 device_release_driver+0x12/0x20 bus_remove_device+0xec/0x160 device_del+0x13b/0x350 device_unregister+0x1a/0x60 remove_iter+0x1e/0x30 device_for_each_child+0x56/0x90 pcie_port_device_remove+0x22/0x40 pcie_portdrv_remove+0x20/0x60 pci_device_remove+0x3e/0xc0 device_release_driver_internal+0x18c/0x250 device_release_driver+0x12/0x20 pci_stop_bus_device+0x6f/0x90 pci_stop_bus_device+0x31/0x90 pci_stop_and_remove_bus_device+0x12/0x20 pciehp_unconfigure_device+0x88/0x140 pciehp_disable_slot+0x6a/0x110 pciehp_handle_presence_or_link_change+0x263/0x400 pciehp_ist+0x1c9/0x1d0 irq_thread_fn+0x24/0x60 irq_thread+0xeb/0x190 kthread+0x120/0x140 INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds. irq/190-pciehp D 0 2288 2 0x80000000 Call Trace: __schedule+0x2a2/0x880 schedule+0x2c/0x80 schedule_preempt_disabled+0xe/0x10 mutex_lock+0x2c/0x30 pci_lock_rescan_remove+0x15/0x20 pciehp_unconfigure_device+0x4d/0x140 pciehp_disable_slot+0x6a/0x110 pciehp_handle_presence_or_link_change+0x263/0x400 pciehp_ist+0x1c9/0x1d0 irq_thread_fn+0x24/0x60 irq_thread+0xeb/0x190 kthread+0x120/0x140 What happens here is that the whole hierarchy is runtime resumed and the parent PCIe downstream port, which got the hot-remove event, starts removing devices below it, taking pci_lock_rescan_remove() lock. When the child PCIe port is runtime resumed it calls pciehp_check_presence() which ends up calling pciehp_card_present() and pciehp_check_link_active(). Both of these use pcie_capability_read_word(), which notices that the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND with the capability value set to 0. When pciehp gets this value it thinks that its child device is also hot-removed and schedules its IRQ thread to handle the event. The deadlock happens when the child's IRQ thread runs and tries to acquire pci_lock_rescan_remove() which is already taken by the parent and the parent waits for the child's IRQ thread to finish. Prevent this from happening by checking the return value of pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop performing any hot-removal activities. [bhelgaas: add common scenarios to commit log] Link: https://lore.kernel.org/r/20191029170022.57528-2-mika.westerberg@linux.intel.com Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* PCI: pciehp: Refactor infinite loop in pcie_poll_cmd()Andy Shevchenko2019-11-111-4/+2
| | | | | | | | | | | | | | | | Infinite timeout loops are hard to read. Refactor it to plausible 'do {} while ()'. Note, the supplied timeout can't be negative for current use, though if it's not dividable to 10, we may go below 0, that's why type of the parameter is int. And thus, we may move the check to the loop condition. No functional change intended. Link: https://lore.kernel.org/r/20191108111855.85866-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com>
* PCI: pciehp: Avoid returning prematurely from sysfs requestsLukas Wunner2019-10-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | A sysfs request to enable or disable a PCIe hotplug slot should not return before it has been carried out. That is sought to be achieved by waiting until the controller's "pending_events" have been cleared. However the IRQ thread pciehp_ist() clears the "pending_events" before it acts on them. If pciehp_sysfs_enable_slot() / _disable_slot() happen to check the "pending_events" after they have been cleared but while pciehp_ist() is still running, the functions may return prematurely with an incorrect return value. Fix by introducing an "ist_running" flag which must be false before a sysfs request is allowed to return. Fixes: 32a8cef274fe ("PCI: pciehp: Enable/disable exclusively from IRQ thread") Link: https://lore.kernel.org/linux-pci/1562226638-54134-1-git-send-email-wangxiongfeng2@huawei.com Link: https://lore.kernel.org/r/4174210466e27eb7e2243dd1d801d5f75baaffd8.1565345211.git.lukas@wunner.de Reported-and-tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v4.19+
* PCI: pciehp: Remove pciehp_green_led_{on,off,blink}()Denis Efremov2019-09-051-36/+0
| | | | | | | | | | | | Remove pciehp_green_led_{on,off,blink}() and use pciehp_set_indicators() instead, since the code is mostly the same. [bhelgaas: drop set_power_indicator() wrapper to reduce the number of interfaces] Link: https://lore.kernel.org/r/20190903111021.1559-5-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
* PCI: pciehp: Remove pciehp_set_attention_status()Denis Efremov2019-09-051-25/+0
| | | | | | | | | | Remove pciehp_set_attention_status() and use pciehp_set_indicators() instead, since the code is mostly the same. Link: https://lore.kernel.org/r/20190903111021.1559-4-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
* PCI: pciehp: Combine adjacent indicator updatesDenis Efremov2019-09-051-2/+2
| | | | | | | | | | | | Combine adjacent updates of power and attention indicators into a single pciehp_set_indicators() call. This sends one command to the hotplug controller instead of two. Link: https://lore.kernel.org/r/20190903111021.1559-3-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
* PCI: pciehp: Add pciehp_set_indicators() to set both indicatorsDenis Efremov2019-09-051-0/+36
| | | | | | | | | | | | | | | | | | | Add pciehp_set_indicators() to set power and attention indicators with a single register write. This is a minor optimization because we frequently set both indicators and this can do it with a single command. It also reduces the number of interfaces related to the indicators and makes them more discoverable because callers use the PCI_EXP_SLTCTL_ATTN_IND_* and PCI_EXP_SLTCTL_PWR_IND_* definitions directly. [bhelgaas: extend commit log, s/PCI_EXP_SLTCTL_.*_IND_NONE/INDICATOR_NOOP/ so they don't look like things defined by the spec, add function doc, mask commands to make it obvious we only send valid commands (pcie_do_write_cmd() does mask it, but requires more effort to verify)] Link: https://lore.kernel.org/r/20190903111021.1559-2-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* PCI: pciehp: Remove pointless MY_NAME definitionBjorn Helgaas2019-05-091-1/+1
| | | | | | | | | MY_NAME is only used once and offers no benefit, so remove it. Link: https://lore.kernel.org/lkml/20190509141456.223614-11-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
* PCI: pciehp: Remove unused dbg/err/info/warn() wrappersFrederick Lawler2019-05-091-1/+1
| | | | | | | | | | | Replace the last uses of dbg() with the equivalent pr_debug(), then remove unused dbg(), err(), info(), and warn() wrappers. Link: https://lore.kernel.org/lkml/20190509141456.223614-9-helgaas@kernel.org Signed-off-by: Frederick Lawler <fred@fredlawl.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
* PCI: pciehp: Log messages with pci_dev, not pcie_deviceFrederick Lawler2019-05-091-0/+2
| | | | | | | | | | | | | | | | Log messages with pci_dev, not pcie_device. Factor out common message prefixes with dev_fmt(). Example output change: - pciehp 0000:00:06.0:pcie004: Slot(0) Powering on due to button press + pcieport 0000:00:06.0: pciehp: Slot(0) Powering on due to button press Link: https://lore.kernel.org/lkml/20190509141456.223614-8-helgaas@kernel.org Signed-off-by: Frederick Lawler <fred@fredlawl.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
* PCI: pciehp: Remove pciehp_debug usesBjorn Helgaas2019-05-091-8/+5
| | | | | | | | | | | | | We're about to convert pciehp to the dyndbg mechanism, which means we can eventually remove pciehp_debug. Replace uses of pciehp_debug with dbg() and ctrl_dbg(), which check pciehp_debug internally. Link: https://lore.kernel.org/lkml/20190509141456.223614-6-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
* Merge branch 'pci/pm'Bjorn Helgaas2019-03-061-2/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Blacklist Gigabyte X299 Root Port power management to fix Thunderbolt hotplug (Mika Westerberg) - Revert runtime PM suspend/resume callbacks that broke PME on network cable plug (Mika Westerberg) - Disable Data Link State Changed interrupts to prevent wakeup immediately after suspend (Mika Westerberg) * pci/pm: PCI/PME: Fix possible use-after-free on remove PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove() PCI: pciehp: Disable Data Link Layer State Changed event on suspend Revert "PCI/PME: Implement runtime PM callbacks" PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports
| * PCI: pciehp: Disable Data Link Layer State Changed event on suspendMika Westerberg2019-02-151-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0e157e528604 ("PCI/PME: Implement runtime PM callbacks") tried to solve an issue where the hierarchy immediately wakes up when it is transitioned into D3cold. However, it turns out to prevent PME propagation on some systems that do not support D3cold. I looked more closely at what might cause the immediate wakeup. It happens when the ACPI power resource of the root port is turned off. The AML code associated with the _OFF() method of the ACPI power resource starts a PCIe L2/L3 Ready transition and waits for it to complete. Right after the L2/L3 Ready transition is started the root port receives a PME from the downstream port. The simplest hierarchy where this happens looks like this: 00:1d.0 PCIe Root Port ^ | v 05:00.0 PCIe switch #1 upstream port 06:01.0 PCIe switch #1 downstream hotplug port ^ | v 08:00.0 PCIe switch #2 upstream port It seems that the PCIe link between the two switches, before PME_Turn_Off/PME_TO_Ack is complete for the whole hierarchy, goes inactive and triggers PME towards the root port bringing it back to D0. The L2/L3 Ready sequence is described in PCIe r4.0 spec sections 5.2 and 5.3.3 but unfortunately they do not state what happens if DLLSCE is enabled during the sequence. Disabling Data Link Layer State Changed event (DLLSCE) seems to prevent the issue and still allows the downstream hotplug port to notice when a device is plugged/unplugged. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202593 Fixes: 0e157e528604 ("PCI/PME: Implement runtime PM callbacks") Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> CC: stable@vger.kernel.org # v4.20+
* | Merge branch 'pci/misc'Bjorn Helgaas2019-03-061-0/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Mark expected switch fall-through (Mathieu Malaterre) - Use of_node_name_eq() for node name comparisons (Rob Herring) - Add ACS and pciehp quirks for HXT SD4800 (Shunyong Yang) - Consolidate Rohm Vendor ID definitions (Andy Shevchenko) - Use u32 (not __u32) for things not exposed to userspace (Logan Gunthorpe) - Fix locking semantics of bus and slot reset interfaces (Alex Williamson) - Update PCIEPORTBUS Kconfig help text (Hou Zhiqiang) * pci/misc: PCI: Update PCIEPORTBUS Kconfig help text PCI: Fix "try" semantics of bus and slot reset PCI: Clean up usage of __u32 type genirq/msi: Clean up usage of __u8/__u16 types PCI: Move Rohm Vendor ID to generic list PCI: pciehp: Add HXT quirk for Command Completed errata PCI: Add ACS quirk for HXT SD4800 PCI: Add HXT vendor ID PCI: Use of_node_name_eq() for node name comparisons PCI: Mark expected switch fall-through
| * | PCI: pciehp: Add HXT quirk for Command Completed errataShunyong Yang2019-02-011-0/+2
| |/ | | | | | | | | | | | | | | | | | | The HXT SD4800 PCI controller does not set the Command Completed bit unless writes to the Slot Command register change "Control" bits. Add SD4800 to the quirk. Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Joey Zheng <yu.zheng@hxt-semitech.com>
* / PCI: pciehp: Assign ctrl->slot_ctrl before writing it to hardwareMika Westerberg2019-01-141-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | Shameerali reported that running v4.20-rc1 as QEMU guest, the PCIe hotplug port times out during boot: pciehp 0000:00:01.0:pcie004: Timeout on hotplug command 0x03f1 (issued 1016 msec ago) pciehp 0000:00:01.0:pcie004: Timeout on hotplug command 0x03f1 (issued 1024 msec ago) pciehp 0000:00:01.0:pcie004: Failed to check link status pciehp 0000:00:01.0:pcie004: Timeout on hotplug command 0x02f1 (issued 2520 msec ago) The issue was bisected down to commit 720d6a671a6e ("PCI: pciehp: Do not handle events if interrupts are masked") and was further analyzed by the reporter to be caused by the fact that pciehp first updates the hardware and only then cache the ctrl->slot_ctrl in pcie_do_write_cmd(). If the interrupt happens before we cache the value, pciehp_isr() reads value 0 and decides that the interrupt was not meant for it causing the above timeout to trigger. Fix by moving ctrl->slot_ctrl assignment to happen before it is written to the hardware. Fixes: 720d6a671a6e ("PCI: pciehp: Do not handle events if interrupts are masked") Link: https://lore.kernel.org/linux-pci/5FC3163CFD30C246ABAA99954A238FA8387DD344@FRAEML521-MBX.china.huawei.com Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* PCI: pciehp: Do not handle events if interrupts are maskedMika Westerberg2018-10-021-2/+4
| | | | | | | | | | | | | PCIe native hotplug shares MSI vector with native PME so the interrupt handler might get called even the hotplug interrupt is masked. In that case we should not handle any events because the interrupt was not meant for us. Modify the PCIe hotplug interrupt handler to check this accordingly and bail out if it finds out that the interrupt was not about hotplug. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
* PCI: pciehp: Disable hotplug interrupt during suspendMika Westerberg2018-10-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When PCIe hotplug port is transitioned into D3hot, the link to the downstream component will go down. If hotplug interrupt generation is enabled when that happens, it will trigger immediately, waking up the system and bringing the link back up. To prevent this, disable hotplug interrupt generation when system suspend is entered. This does not prevent wakeup from low power states according to PCIe 4.0 spec section 6.7.3.4: Software enables a hot-plug event to generate a wakeup event by enabling software notification of the event as described in Section 6.7.3.1. Note that in order for software to disable interrupt generation while keeping wakeup generation enabled, the Hot-Plug Interrupt Enable bit must be cleared. So as long as we have set the slot event mask accordingly, wakeup should work even if slot interrupt is disabled. The port should trigger wake and then send PME to the root port when the PCIe hierarchy is brought back up. Limit this to systems using native PME mechanism to make sure older Apple systems depending on commit e3354628c376 ("PCI: pciehp: Support interrupts sent from D3hot") still continue working. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* PCI: Make link active reporting detection genericKeith Busch2018-10-021-20/+2
| | | | | | | | | | | | | The spec has timing requirements when waiting for a link to become active after a conventional reset. Implement those hard delays when waiting for an active link so pciehp and dpc drivers don't need to duplicate this. For devices that don't support data link layer active reporting, wait the fixed time recommended by the PCIe spec. Signed-off-by: Keith Busch <keith.busch@intel.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
* PCI: hotplug: Embed hotplug_slotLukas Wunner2018-09-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the PCI hotplug core and its first user, cpqphp, were introduced in February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot struct for its internal use plus a hotplug_slot struct to be registered with the hotplug core and linked the two with pointers: https://git.kernel.org/tglx/history/c/a8a2069f432c Nowadays, the predominant pattern in the tree is to embed ("subclass") such structures in one another and cast to the containing struct with container_of(). But it wasn't until July 2002 that container_of() was introduced with historic commit ec4f214232cf: https://git.kernel.org/tglx/history/c/ec4f214232cf pnv_php, introduced in 2016, did the right thing and embedded struct hotplug_slot in its internal struct pnv_php_slot, but all other drivers cargo-culted cpqphp's design and linked separate structs with pointers. Embedding structs is preferrable to linking them with pointers because it requires fewer allocations, thereby reducing overhead and simplifying error paths. Casting an embedded struct to the containing struct becomes a cheap subtraction rather than a dereference. And having fewer pointers reduces the risk of them pointing nowhere either accidentally or due to an attack. Convert all drivers to embed struct hotplug_slot in their internal slot struct. The "private" pointer in struct hotplug_slot thereby becomes unused, so drop it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa* Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390* Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86 Cc: Len Brown <lenb@kernel.org> Cc: Scott Murray <scott@spiteful.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Corentin Chary <corentin.chary@gmail.com> Cc: Darren Hart <dvhart@infradead.org>
* PCI: pciehp: Rename controller struct members for clarityLukas Wunner2018-09-181-3/+3
| | | | | | | | | | | | Of the members which were just moved from pciehp's slot struct to the controller struct, rename "lock" to "state_lock" and rename "work" to "button_work" for clarity. Perform the rename separately to the unification of the two structs per Sinan's request. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Sinan Kaya <okaya@kernel.org>
* PCI: pciehp: Unify controller and slot structsLukas Wunner2018-09-181-79/+35
| | | | | | | | | | | | | | | | | | | | | | | | pciehp was originally introduced together with shpchp in a single commit, c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI Express hot-plug drivers"): https://git.kernel.org/tglx/history/c/c16b4b14d980 shpchp supports up to 31 slots per controller, hence uses separate slot and controller structs. pciehp has a 1:1 relationship between slot and controller and therefore never required this separation. Nevertheless, because much of the code had been copy-pasted between the two drivers, pciehp likewise uses separate structs to this very day. The artificial separation of data structures adds unnecessary complexity and bloat to pciehp and requires constantly chasing pointers at runtime. Simplify the driver by merging struct slot into struct controller. Merge the slot constructor pcie_init_slot() and the destructor pcie_cleanup_slot() into the controller counterparts. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* PCI: pciehp: Tolerate Presence Detect hardwired to zeroLukas Wunner2018-09-181-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The WiGig Bus Extension (WBE) specification allows tunneling PCIe over IEEE 802.11. A product implementing this spec is the wil6210 from Wilocity (now part of Qualcomm Atheros). It integrates a PCIe switch with a wireless network adapter: 00.0-+ [1ae9:0101] Upstream Port +-00.0-+ [1ae9:0200] Downstream Port | +-00.0 [168c:0034] Atheros AR9462 Wireless Network Adapter +-02.0 [1ae9:0201] Downstream Port +-03.0 [1ae9:0201] Downstream Port Wirelessly attached devices presumably appear below the hotplug ports with device ID [1ae9:0201]. Oddly, the Downstream Port [1ae9:0200] leading to the wireless network adapter is likewise Hotplug Capable, but has its Presence Detect State bit hardwired to zero. Even if the Link Active bit is set, Presence Detect is zero, so this cannot be caused by in-band presence detection but only by broken hardware. pciehp assumes an empty slot if Presence Detect State is zero, regardless of Link Active being one. Consequently, up until v4.18 it removes the wireless network adapter in pciehp_resume(). From v4.19 it already does so in pciehp_probe(). Be lenient towards broken hardware and assume the slot is occupied if Link Active is set: Introduce pciehp_card_present_or_link_active() and use it in lieu of pciehp_get_adapter_status() everywhere, except in pciehp_handle_presence_or_link_change() whose log messages depend on which of Presence Detect State or Link Active is set. Remove the Presence Detect State check from __pciehp_enable_slot() because it is only called if either of Presence Detect State or Link Active is set. Caution: There is a possibility that broken hardware exists which has working Presence Detect but hardwires Link Active to one. On such hardware the slot will now incorrectly be considered always occupied. If such hardware is discovered, this commit can be rolled back and a quirk can be added which sets is_hotplug_bridge = 0 for [1ae9:0200]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200839 Reported-and-tested-by: David Yang <mmyangfl@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rajat Jain <rajatja@google.com> Cc: Ashok Raj <ashok.raj@intel.com>
* PCI: pciehp: Drop hotplug_slot_ops wrappersLukas Wunner2018-09-171-2/+6
| | | | | | | | | | | | | pciehp's ->enable_slot, ->disable_slot, ->get_attention_status and ->reset_slot callbacks are currently implemented by wrapper functions that do nothing else but call down to a backend function. The backends are not called from anywhere else, so drop the wrappers and use the backends directly as callbacks, thereby shaving off a few lines of unnecessary code. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* PCI: pciehp: Drop unnecessary includesLukas Wunner2018-09-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop the following includes from pciehp source files which no longer use any of the included symbols: * <linux/sched/signal.h> in pciehp.h <linux/signal.h> in pciehp_hpc.c Added by commit de25968cc87c ("fix more missing includes") to accommodate for a call to signal_pending(). The call was removed by commit 262303fe329a ("pciehp: fix wait command completion"). * <linux/interrupt.h> in pciehp_core.c Added by historic commit f308a2dfbe63 ("PCI: add PCI Express Port Bus Driver subsystem") to accommodate for a call to free_irq(): https://git.kernel.org/tglx/history/c/f308a2dfbe63 The call was removed by commit 407f452b05f9 ("pciehp: remove unnecessary free_irq"). * <linux/time.h> in pciehp_core.c and pciehp_hpc.c Added by commit 34d03419f03b ("PCIEHP: Add Electro Mechanical Interlock (EMI) support to the PCIE hotplug driver."), which was reverted by commit bd3d99c17039 ("PCI: Remove untested Electromechanical Interlock (EMI) support in pciehp."). * <linux/module.h> in pciehp_ctrl.c, pciehp_hpc.c and pciehp_pci.c Added by historic commit c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI Express hot-plug drivers"): https://git.kernel.org/tglx/history/c/c16b4b14d980 Module-related symbols were neither used back then in those files, nor are they used today. * <linux/slab.h> in pciehp_ctrl.c Added by commit 5a0e3ad6af86 ("include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h") to accommodate for calls to kmalloc(). The calls were removed by commit 0e94916e6091 ("PCI: pciehp: Handle events synchronously"). * "../pci.h" in pciehp_ctrl.c Added by historic commit 67f4660b72f2 ("PCI: ASPM patch for") to accommodate for usage of the global variable pcie_mch_quirk: https://git.kernel.org/tglx/history/c/67f4660b72f2 The global variable was removed by commit 0ba379ec0fb1 ("PCI: Simplify hotplug mch quirk"). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* PCI: pciehp: Fix hot-add vs powerfault detection orderKeith Busch2018-09-111-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If both hot-add and power fault were observed in a single interrupt, we handled the hot-add first, then the power fault, in this path: pciehp_ist if (events & (PDC | DLLSC)) pciehp_handle_presence_or_link_change case OFF_STATE: pciehp_enable_slot __pciehp_enable_slot board_added pciehp_power_on_slot ctrl->power_fault_detected = 0 pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC) pciehp_green_led_on(p_slot) # power LED on pciehp_set_attention_status(p_slot, 0) # attention LED off if ((events & PFD) && !ctrl->power_fault_detected) ctrl->power_fault_detected = 1 pciehp_set_attention_status(1) # attention LED on pciehp_green_led_off(slot) # power LED off This left the attention indicator on (even though the hot-add succeeded) and the power indicator off (even though the slot power was on). Fix this by checking for power faults before checking for new devices. Prior to 0e94916e6091, this was successful because everything was chained through work queues and the order was: INT_PRESENCE_ON -> INT_POWER_FAULT -> ENABLE_REQ The ENABLE_REQ cleared the power fault at the end, but now everything is handled inline with the interrupt thread, such that the work ENABLE_REQ was doing happens before power fault handling now. Fixes: 0e94916e6091 ("PCI: pciehp: Handle events synchronously") Signed-off-by: Keith Busch <keith.busch@intel.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
* Merge branch 'pci/virtualization'Bjorn Helgaas2018-08-151-2/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - To avoid bus errors, enable PASID only if entire path supports End-End TLP prefixes (Sinan Kaya) - Unify slot and bus reset functions and remove hotplug knowledge from callers (Sinan Kaya) - Add Function-Level Reset quirks for Intel and Samsung NVMe devices to fix guest reboot issues (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller (Bjorn Helgaas) * pci/virtualization: PCI: Add function 1 DMA alias quirk for Marvell 88SS9183 PCI: Delay after FLR of Intel DC P3700 NVMe PCI: Disable Samsung SM961/PM961 NVMe before FLR PCI: Export pcie_has_flr() PCI: Rename pci_try_reset_bus() to pci_reset_bus() PCI: Deprecate pci_reset_bus() and pci_reset_slot() functions PCI: Unify try slot and bus reset API PCI: Hide pci_reset_bridge_secondary_bus() from drivers IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset PCI: Handle error return from pci_reset_bridge_secondary_bus() PCI/IOV: Tidy pci_sriov_set_totalvfs() PCI: Enable PASID only if entire path supports End-End TLP prefixes # Conflicts: # drivers/pci/hotplug/pciehp_hpc.c
| * PCI: Hide pci_reset_bridge_secondary_bus() from driversSinan Kaya2018-07-191-1/+1
| | | | | | | | | | | | | | | | | | Rename pci_reset_bridge_secondary_bus() to pci_bridge_secondary_bus_reset() and move the declaration from linux/pci.h to drivers/pci.h to be used internally in PCI directory only. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| * PCI: Handle error return from pci_reset_bridge_secondary_bus()Sinan Kaya2018-07-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 01fd61c0b9bd ("PCI: Add a return type for pci_reset_bridge_secondary_bus()") added a return value to the function to return if a device is accessible following a reset. Callers are not checking the value. Pass error code up high in the stack if device is not accessible. Fixes: 01fd61c0b9bd ("PCI: Add a return type for pci_reset_bridge_secondary_bus()") Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | PCI: pciehp: Deduplicate presence check on probe & resumeLukas Wunner2018-07-311-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On driver probe and on resume from system sleep, pciehp checks the Presence Detect State bit in the Slot Status register to bring up an occupied slot or bring down an unoccupied slot. Both code paths are identical, so deduplicate them per Mika's request. On probe, an additional check is performed to disable power of an unoccupied slot. This can e.g. happen if power was enabled by BIOS. It cannot happen once pciehp has taken control, hence is not necessary on resume: The Slot Control register is set to the same value that it had on suspend by pci_restore_state(), so if the slot was occupied, power is enabled and if it wasn't, power is disabled. Should occupancy have changed during the system sleep transition, power is adjusted by bringing up or down the slot per the paragraph above. To allow for deduplication of the presence check, move the power check to pcie_init(). This seems safer anyway, because right now it is performed while interrupts are already enabled, and although I can't think of a scenario where pciehp_power_off_slot() and the IRQ thread collide, it does feel brittle. However this means that pcie_init() may now write to the Slot Control register before the IRQ is requested. If both the CCIE and HPIE bits happen to be set, pcie_wait_cmd() will wait for an interrupt (instead of polling the Command Completed bit) and eventually emit a timeout message. Additionally, if a level-triggered INTx interrupt is used, the user may see a spurious interrupt splat. Avoid by disabling interrupts before disabling power. (Normally the HPIE and CCIE bits should be clear on probe, but conceivably they may already have been set e.g. by BIOS.) Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | PCI: pciehp: Resume parent to D0 on config space accessLukas Wunner2018-07-311-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Ensure accessibility of a hotplug port's config space when accessed via sysfs by resuming its parent to D0. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
* | PCI: pciehp: Support interrupts sent from D3hotLukas Wunner2018-07-311-2/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a hotplug port is able to send an interrupt, one would naively assume that it is accessible at that moment. After all, if it wouldn't be accessible, i.e. if its parent is in D3hot and the link to the hotplug port is thus down, how should an interrupt come through? It turns out that assumption is wrong at least for Thunderbolt: Even though its parents are in D3hot, a Thunderbolt hotplug port is able to signal interrupts. Because the port's config space is inaccessible and resuming the parents may sleep, the hard IRQ handler has to defer runtime resuming the parents and reading the Slot Status register to the IRQ thread. If the hotplug port uses a level-triggered INTx interrupt, it needs to be masked until the IRQ thread has cleared the signaled events. For simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts. Note that if the interrupt is shared (which can only happen for INTx), other devices are starved from receiving interrupts until the IRQ thread is scheduled, has runtime resumed the hotplug port's parents and has read and cleared the Slot Status register. That delay is dominated by the 10 ms D3hot->D0 transition time of each parent port. The worst case is a Thunderbolt downstream port at the end of a daisy chain: There may be up to six Thunderbolt controllers in-between it and the root port, each comprising an upstream and downstream port, plus its own upstream port. That's 13 x 10 = 130 ms. Possible mitigations are polling the interrupt while it's disabled or reducing the d3_delay of Thunderbolt ports if possible. Open code masking of the interrupt instead of requesting it with the IRQF_ONESHOT flag to minimize the period during which it is masked. (IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.) PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the associated form factor specification, a hotplug capable Downstream Port must support generation of a wakeup event (using the PME mechanism) on hotplug events that occur when the system is in a sleep state or the Port is in device state D1, D2, or D3Hot." This would seem to imply that PME needs to be enabled on the hotplug port when it is runtime suspended. pci_enable_wake() currently doesn't enable PME on bridges, it may be necessary to add an exemption for hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the PME_Status bit is not set when an interrupt occurs while the hotplug port is in D3hot, even if PME is enabled. (I've tested this on a Mac and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in negotiate_os_control(), modifying it to 1 didn't change the behavior.) (Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event interrupts (when both are implemented) always share the same MSI or MSI-X vector". That would only seem to apply to Root Ports, however the section never mentions Root Ports, only Downstream Ports. This is explained in the definition of "Downstream Port" in the "Terms and Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that are not the Upstream Port are Downstream Ports. All Ports on a Root Complex are Downstream Ports.") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
* | PCI: pciehp: Clear spurious events earlier on resumeLukas Wunner2018-07-311-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thunderbolt hotplug ports that were occupied before system sleep resume with their downstream link in "off" state. Only after the Thunderbolt controller has reestablished the PCIe tunnels does the link go up. As a result, a spurious Presence Detect Changed and/or Data Link Layer State Changed event occurs. The events are not immediately acted upon because tunnel reestablishment happens in the ->resume_noirq phase, when interrupts are still disabled. Also, notification of events may initially be disabled in the Slot Control register when coming out of system sleep and is reenabled in the ->resume_noirq phase through: pci_pm_resume_noirq() pci_pm_default_resume_early() pci_restore_state() pci_restore_pcie_state() It is not guaranteed that the events are acted upon at all: PCIe r4.0, sec 6.7.3.4 says that "a port may optionally send an MSI when there are hot-plug events that occur while interrupt generation is disabled, and interrupt generation is subsequently enabled." Note the "optionally". If an MSI is sent, pciehp will gratuitously turn the slot off and back on once the ->resume_early phase has commenced. If an MSI is not sent, the extant, unacknowledged events in the Slot Status register will prevent future notification of presence or link changes. Commit 13c65840feab ("PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume") fixed the latter by clearing the events in the ->resume phase. Move this to the ->resume_noirq phase to also fix the gratuitous disable/enablement of the slot. The commit further restored the Slot Control register in the ->resume phase, but that's dispensable because as shown above it's already been done in the ->resume_noirq phase. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
* | PCI: pciehp: Avoid slot access during resetLukas Wunner2018-07-311-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ->reset_slot callback introduced by commits: 2e35afaefe64 ("PCI: pciehp: Add reset_slot() method") and 06a8d89af551 ("PCI: pciehp: Disable link notification across slot reset") disables notification of Presence Detect Changed and Data Link Layer State Changed events for the duration of a secondary bus reset. However a bus reset not only triggers these events, but may also clear the Presence Detect State bit in the Slot Status register and the Data Link Layer Link Active bit in the Link Status register momentarily. According to Sinan Kaya: "I know for a fact that bus reset clears the Data Link Layer Active bit as soon as link goes down. It gets set again following link up. Presence detect depends on the HW implementation. QDT root ports don't change presence detect for instance since nobody actually removed the card. If an implementation supports in-band presence detect, the answer is yes. As soon as the link goes down, presence detect bit will get cleared until recovery." https://lkml.kernel.org/r/42e72f83-3b24-f7ef-e5bc-290fae99259a@codeaurora.org In-band presence detect is also covered in Table 4-15 in PCIe r4.0, sec 4.2.6. pciehp should therefore ensure that any parts of the driver that access those bits do not run concurrently to a bus reset. The only precaution the commits took to that effect was to halt interrupt polling. They made no effort to drain the slot workqueue, cancel an outstanding Attention Button work, or block slot enable/disable requests via sysfs and in the ->probe hook. Now that pciehp is converted to enable/disable the slot exclusively from the IRQ thread, the only places accessing the two above-mentioned bits are the IRQ thread and the ->probe hook. Add locking to serialize them with a bus reset. This obviates the need to halt interrupt polling. Do not add locking to the ->get_adapter_status sysfs callback to afford users unfettered access to that bit. Use an rw_semaphore in lieu of a regular mutex to allow parallel execution of the non-reset code paths accessing the critical bits, i.e. the IRQ thread and the ->probe hook. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rajat Jain <rajatja@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Sinan Kaya <okaya@kernel.org>
* | PCI: pciehp: Always enable occupied slot on probeLukas Wunner2018-07-231-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when there are hot-plug events that occur while interrupt generation is disabled, and interrupt generation is subsequently enabled." On probe, we currently clear all event bits in the Slot Status register with the notable exception of the Presence Detect Changed bit. Thereby we seek to receive an interrupt for an already occupied slot once event notification is enabled. But because the interrupt is optional, users may have to specify the pciehp_force parameter on the command line, which is inconvenient. Moreover, now that pciehp's event handling has become resilient to missed events, a Presence Detect Changed interrupt for a slot which is powered on is interpreted as removal of the card. If the slot has already been brought up by the BIOS, receiving such an interrupt on probe causes the slot to be powered off and immediately back on, which is likewise undesirable. Avoid both issues by making the behavior of pciehp_force the default and clearing the Presence Detect Changed bit on probe. Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC ("Force pciehp, even if OSHP is missing") seems nonsensical because the OSHP control method is only relevant for SHCP slots according to the PCI Firmware specification r3.0, sec 4.8. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
* | PCI: pciehp: Become resilient to missed eventsLukas Wunner2018-07-231-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A hotplug port's Slot Status register does not count how often each type of event occurred, it only records the fact *that* an event has occurred. Previously pciehp queued a work item for each event. But if it missed an event, e.g. removal of a card in-between two back-to-back insertions, it queued up the wrong work item or no work item at all. Commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones") sought to improve the situation by shrinking the window during which events may be missed. But Stefan Roese reports unbalanced Card present and Link Up events, suggesting that we're still missing events if they occur very rapidly. Bjorn Helgaas responds that he considers pciehp's event handling "baroque" and calls for its simplification and rationalization: https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com It gets worse once a hotplug port is runtime suspended: The port can signal an interrupt while it and its parents are in D3hot, i.e. while it is inaccessible. By the time we've runtime resumed all parents to D0 and read the port's Slot Status register, we may have missed an arbitrary number of events. Event handling therefore needs to be reworked to become resilient to missed events. Assume that a Presence Detect Changed event has occurred. Consider the following truth table: - Slot is in OFF_STATE and is currently empty. => Do nothing. (The event is trailing a Link Down or we've missed an insertion and subsequent removal.) - Slot is in OFF_STATE and is currently occupied. => Turn the slot on. - Slot is in ON_STATE and is currently empty. => Turn the slot off. - Slot is in ON_STATE and is currently occupied. => Turn the slot off, (Be cautious and assume the card in then back on. the slot isn't the same as before.) This leads to the following simple algorithm: 1 If the slot is in ON_STATE, turn it off unconditionally. 2 If the slot is currently occupied, turn it on. Because those actions are now carried out synchronously, rather than by scheduled work items, pciehp reacts to the *current* situation and missed events no longer matter. Data Link Layer State Changed events can be handled identically to Presence Detect Changed events. Note that in the above truth table, a Link Up trailing a Card present event didn't have to be accounted for: It is filtered out by pciehp_check_link_status(). As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says: "Once the Power Indicator begins blinking, a 5-second abort interval exists during which a second depression of the Attention Button cancels the operation." In other words, the user can only expect the system to react to a button press after it starts blinking. Missed button presses that occur in-between are irrelevant. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefan Roese <sr@denx.de> Cc: Mayurkumar Patel <mayurkumar.patel@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
* | PCI: pciehp: Tolerate initially unstable linkLukas Wunner2018-07-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a device is hotplugged, Presence Detect and Link Up events often do not occur simultaneously, but with a lag of a few milliseconds. Only the first event received is relevant, the other one can be disregarded. Moreover, Stefan Roese reports that on certain platforms, Link State and Presence Detect may flap for up to 100 ms before stabilizing, suggesting that such events should be disregarded for at least this long: https://lkml.kernel.org/r/20180130084121.18653-1-sr@denx.de On slot enablement, pciehp_check_link_status() waits for 100 ms per PCIe r4.0, sec 6.7.3.3, then probes the hotplugged device's vendor register for up to 1 second. If this succeeds, the link is definitely up, so ignore any Presence Detect or Link State events that occurred up to this point. pciehp_check_link_status() then checks the Link Training bit in the Link Status register. This is the final opportunity to detect inaccessibility of the device and abort slot enablement. Any link or presence change that occurs afterwards will cause the slot to be disabled again immediately after attempting to enable it. The astute reviewer may appreciate that achieving this behavior would be more complicated had pciehp not just been converted to enable/disable the slot exclusively from the IRQ thread: When the slot is enabled via sysfs, each link or presence flap would otherwise cause the IRQ thread to run and it would have to sense that those events are belonging to a concurrent slot enablement operation and disregard them. It would be much more difficult than this mere 3 line change. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefan Roese <sr@denx.de>
* | PCI: pciehp: Drop enable/disable lockLukas Wunner2018-07-231-1/+0
| | | | | | | | | | | | | | | | | | Previously slot enablement and disablement could happen concurrently. But now it's under the exclusive control of the IRQ thread, rendering the locking obsolete. Drop it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | PCI: pciehp: Enable/disable exclusively from IRQ threadLukas Wunner2018-07-231-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Besides the IRQ thread, there are several other places in the driver which enable or disable the slot: - pciehp_probe() enables the slot if it's occupied and the pciehp_force module parameter is used. - pciehp_resume() enables or disables the slot after system sleep. - pciehp_queue_pushbutton_work() enables or disables the slot after the 5 second delay following an Attention Button press. - pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or disable the slot on sysfs write. This requires locking and complicates pciehp's state machine. A simplification can be achieved by enabling and disabling the slot exclusively from the IRQ thread. Amend the functions listed above to request slot enable/disablement from the IRQ thread by either synthesizing a Presence Detect Changed event or, in the case of a disable user request (via sysfs or an Attention Button press), submitting a newly introduced force disable request. The latter is needed because the slot shall be forced off despite being occupied. For this force disable request, avoid colliding with Slot Status register bits by using a bit number greater than 16. For synchronous execution of requests (on sysfs write), wait for the request to finish and retrieve the result. There can only ever be one sysfs write in flight due to the locking in kernfs_fop_write(), hence there is no risk of returning the result of a different sysfs request to user space. The POWERON_STATE and POWEROFF_STATE is now no longer entered by the above-listed functions, but solely by the IRQ thread when it begins a power transition. Afterwards, it moves to STATIC_STATE. The same applies to canceling the Attention Button work, it likewise becomes an IRQ thread only operation. An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is never observed by the IRQ thread itself, only by functions called in a different context, such as pciehp_sysfs_enable_slot(). So remove handling of these states from pciehp_handle_button_press() and pciehp_handle_link_change() which are exclusively called from the IRQ thread. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | PCI: pciehp: Track enable/disable statusLukas Wunner2018-07-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handle_button_press_event() currently determines whether the slot has been turned on or off by looking at the Power Controller Control bit in the Slot Control register. This assumes that an attention button implies presence of a power controller even though that's not mandated by the spec. Moreover the Power Controller Control bit is unreliable when a power fault occurs (PCIe r4.0, sec 6.7.1.8). This issue has existed since the driver was introduced in 2004. Fix by replacing STATIC_STATE with ON_STATE and OFF_STATE and tracking whether the slot has been turned on or off. This is also a required ingredient to make pciehp resilient to missed events, which is the object of an upcoming commit. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | PCI: pciehp: Drop slot workqueueLukas Wunner2018-07-231-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | Previously the slot workqueue was used to handle events and enable or disable the slot. That's no longer the case as those tasks are done synchronously in the IRQ thread. The slot workqueue is thus merely used to handle a button press after the 5 second delay and only one such work item may be in flight at any given time. A separate workqueue isn't necessary for this simple task, so use the system workqueue instead. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | PCI: pciehp: Handle events synchronouslyLukas Wunner2018-07-231-19/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now, pciehp's IRQ handler schedules a work item for each event, which in turn schedules a work item to enable or disable the slot. This double indirection was necessary because sleeping wasn't allowed in the IRQ handler. However it is now that pciehp has been converted to threaded IRQ handling and polling, so handle events synchronously in pciehp_ist() and remove the work item infrastructure (with the exception of work items to handle a button press after the 5 second delay). For link or presence change events, move the register read to determine the current link or presence state behind acquisition of the slot lock to prevent it from becoming stale while the lock is contended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | PCI: pciehp: Convert to threaded pollingLukas Wunner2018-07-231-35/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | We've just converted pciehp to threaded IRQ handling, but still cannot sleep in pciehp_ist() because the function is also called in poll mode, which runs in softirq context (from a timer). Convert poll mode to a kthread so that pciehp_ist() always runs in task context. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de>
* | PCI: pciehp: Convert to threaded IRQLukas Wunner2018-07-231-32/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pciehp's IRQ handler queues up a work item for each event signaled by the hardware. A more modern alternative is to let a long running kthread service the events. The IRQ handler's sole job is then to check whether the IRQ originated from the device in question, acknowledge its receipt to the hardware to quiesce the interrupt and wake up the kthread. One benefit is reduced latency to handle the IRQ, which is a necessity for realtime environments. Another benefit is that we can make pciehp simpler and more robust by handling events synchronously in process context, rather than asynchronously by queueing up work items. pciehp's usage of work items is a historic artifact, it predates the introduction of threaded IRQ handlers by two years. (The former was introduced in 2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt handler support").) Convert pciehp to threaded IRQ handling by retrieving the pending events in pciehp_isr(), saving them for later consumption by the thread handler pciehp_ist() and clearing them in the Slot Status register. By clearing the Slot Status (and thereby acknowledging the events) in pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which would have the unpleasant side effect of starving devices sharing the IRQ until pciehp_ist() has finished. pciehp_isr() does not count how many times each event occurred, but merely records the fact *that* an event occurred. If the same event occurs a second time before pciehp_ist() is woken, that second event will not be recorded separately, which is problematic according to commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones") because we may miss removal of a card in-between two back-to-back insertions. We're about to make pciehp_ist() resilient to missed events. The present commit regresses the driver's behavior temporarily in order to separate the changes into reviewable chunks. This doesn't affect regular slow-motion hotplug, only plug-unplug-plug operations that happen in a timespan shorter than wakeup of the IRQ thread. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mayurkumar Patel <mayurkumar.patel@intel.com> Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
* | PCI: pciehp: Fix unprotected list iteration in IRQ handlerLukas Wunner2018-07-231-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") iterates over the devices on a hotplug port's subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem. It is thus possible for a user to cause a crash by concurrently manipulating the device list, e.g. by disabling slot power via sysfs on a different CPU or by initiating a remove/rescan via sysfs. This can't be fixed by acquiring pci_bus_sem because it may sleep. The simplest fix is to avoid the list iteration altogether and just check the ignore_hotplug flag on the port itself. This works because pci_ignore_hotplug() sets the flag both on the device as well as on its parent bridge. We do lose the ability to print the name of the device blocking hotplug in the debug message, but that's probably bearable. Fixes: b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org