summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kvm/book3s_xics.h
Commit message (Collapse)AuthorAgeFilesLines
* KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt2017-04-271-0/+7
| | | | | | | | | | | | | | | | This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Book 3S: XICS: Implement ICS P/Q statesLi Zhong2017-01-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | This patch implements P(Presented)/Q(Queued) states for ICS irqs. When the interrupt is presented, set P. Present if P was not set. If P is already set, don't present again, set Q. When the interrupt is EOI'ed, move Q into P (and clear Q). If it is set, re-present. The asserted flag used by LSI is also incorporated into the P bit. When the irq state is saved, P/Q bits are also saved, they need some qemu modifications to be recognized and passed around to be restored. KVM_XICS_PENDING bit set and saved should also indicate KVM_XICS_PRESENTED bit set and saved. But it is possible some old code doesn't have/recognize the P bit, so when we restore, we set P for PENDING bit, too. The idea and much of the code come from Ben. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECTLi Zhong2017-01-271-2/+0
| | | | | | | | | | | Commit b0221556dbd3 ("KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode") removed the setting of the XICS_RM_REJECT flag. And since that commit, nothing else sets the flag any more, so we can remove the flag and the remaining code that handles it, including the counter that counts how many times it get set. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* KVM: PPC: Book3S HV: Set server for passed-through interruptsPaul Mackerras2016-09-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a guest has a PCI pass-through device with an interrupt, it will direct the interrupt to a particular guest VCPU. In fact the physical interrupt might arrive on any CPU, and then get delivered to the target VCPU in the emulated XICS (guest interrupt controller), and eventually delivered to the target VCPU. Now that we have code to handle device interrupts in real mode without exiting to the host kernel, there is an advantage to having the device interrupt arrive on the same sub(core) as the target VCPU is running on. In this situation, the interrupt can be delivered to the target VCPU without any exit to the host kernel (using a hypervisor doorbell interrupt between threads if necessary). This patch aims to get passed-through device interrupts arriving on the correct core by setting the interrupt server in the real hardware XICS for the interrupt to the first thread in the (sub)core where its target VCPU is running. We do this in the real-mode H_EOI code because the H_EOI handler already needs to look at the emulated ICS state for the interrupt (whereas the H_XIRR handler doesn't), and we know we are running in the target VCPU context at that point. We set the server CPU in hardware using an OPAL call, regardless of what the IRQ affinity mask for the interrupt says, and without updating the affinity mask. This amounts to saying that when an interrupt is passed through to a guest, as a matter of policy we allow the guest's affinity for the interrupt to override the host's. This is inspired by an earlier patch from Suresh Warrier, although none of this code came from that earlier patch. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interruptsPaul Mackerras2016-05-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c9a5eccac1ab ("kvm/eventfd: add arch-specific set_irq", 2015-10-16) added the possibility for architecture-specific code to handle the generation of virtual interrupts in atomic context where possible, without having to schedule a work function. Since we can easily generate virtual interrupts on XICS without having to do anything worse than take a spinlock, we define a kvm_arch_set_irq_inatomic() for XICS. We also remove kvm_set_msi() since it is not used any more. The one slightly tricky thing is that with the new interface, we don't get told whether the interrupt is an MSI (or other edge sensitive interrupt) vs. level-sensitive. The difference as far as interrupt generation is concerned is that for LSIs we have to set the asserted flag so it will continue to fire until it is explicitly cleared. In fact the XICS code gets told which interrupts are LSIs by userspace when it configures the interrupt via the KVM_DEV_XICS_GRP_SOURCES attribute group on the XICS device. To store this information, we add a new "lsi" field to struct ics_irq_state. With that we can also do a better job of returning accurate values when reading the attribute group. Signed-off-by: Paul Mackerras <paulus@samba.org>
* KVM: PPC: Book3S HV: Add ICP real mode countersSuresh Warrier2015-04-211-0/+5
| | | | | | | | | | | | | | | | Add two counters to count how often we generate real-mode ICS resend and reject events. The counters provide some performance statistics that could be used in the future to consider if the real mode functions need further optimizing. The counters are displayed as part of IPC and ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM. Also added two counters that count (approximately) how many times we don't find an ICP or ICS we're looking for. These are not currently exposed through sysfs, but can be useful when debugging crashes. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lockSuresh Warrier2015-04-211-1/+1
| | | | | | | | | | | | | | | | Replaces the ICS mutex lock with a spin lock since we will be porting these routines to real mode. Note that we need to disable interrupts before we take the lock in anticipation of the fact that on the guest side, we are running in the context of a hard irq and interrupts are disabled (EE bit off) when the lock is acquired. Again, because we will be acquiring the lock in hypervisor real mode, we need to use an arch_spinlock_t instead of a normal spinlock here as we want to avoid running any lockdep code (which may not be safe to execute in real mode). Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Add guest->host real mode completion countersSuresh E. Warrier2015-04-211-0/+6
| | | | | | | | | | | | | | | | | | Add counters to track number of times we switch from guest real mode to host virtual mode during an interrupt-related hyper call because the hypercall requires actions that cannot be completed in real mode. This will help when making optimizations that reduce guest-host transitions. It is safe to use an ordinary increment rather than an atomic operation because there is one ICP per virtual CPU and kvmppc_xics_rm_complete() only works on the ICP for the current VCPU. The counters are displayed as part of IPC and ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPISuresh E. Warrier2014-12-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes some inaccuracies in the state machine for the virtualized ICP when implementing the H_IPI hcall (Set_MFFR and related states): 1. The old code wipes out any pending interrupts when the new MFRR is more favored than the CPPR but less favored than a pending interrupt (by always modifying xisr and the pending_pri). This can cause us to lose a pending external interrupt. The correct code here is to only modify the pending_pri and xisr in the ICP if the MFRR is equal to or more favored than the current pending pri (since in this case, it is guaranteed that that there cannot be a pending external interrupt). The code changes are required in both kvmppc_rm_h_ipi and kvmppc_h_ipi. 2. Again, in both kvmppc_rm_h_ipi and kvmppc_h_ipi, there is a check for whether MFRR is being made less favored AND further if new MFFR is also less favored than the current CPPR, we check for any resends pending in the ICP. These checks look like they are designed to cover the case where if the MFRR is being made less favored, we opportunistically trigger a resend of any interrupts that had been previously rejected. Although, this is not a state described by PAPR, this is an action we actually need to do especially if the CPPR is already at 0xFF. Because in this case, the resend bit will stay on until another ICP state change which may be a long time coming and the interrupt stays pending until then. The current code which checks for MFRR < CPPR is broken when CPPR is 0xFF since it will not get triggered in that case. Ideally, we would want to do a resend only if prio(pending_interrupt) < mfrr && prio(pending_interrupt) < cppr where pending interrupt is the one that was rejected. But we don't have the priority of the pending interrupt state saved, so we simply trigger a resend whenever the MFRR is made less favored. 3. In kvmppc_rm_h_ipi, where we save state to pass resends to the virtual mode, we also need to save the ICP whose need_resend we reset since this does not need to be my ICP (vcpu->arch.icp) as is incorrectly assumed by the current code. A new field rm_resend_icp is added to the kvmppc_icp structure for this purpose. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Enable IRQFD support for the XICS interrupt controllerPaul Mackerras2014-08-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | This makes it possible to use IRQFDs on platforms that use the XICS interrupt controller. To do this we implement kvm_irq_map_gsi() and kvm_irq_map_chip_pin() in book3s_xics.c, so as to provide a 1-1 mapping between global interrupt numbers and XICS interrupt source numbers. For now, all interrupts are mapped as "IRQCHIP" interrupts, and no MSI support is provided. This means that kvm_set_irq can now get called with level == 0 or 1 as well as the powerpc-specific values KVM_INTERRUPT_SET, KVM_INTERRUPT_UNSET and KVM_INTERRUPT_SET_LEVEL. We change ics_deliver_irq() to accept all those values, and remove its report_status argument, as it is always false, given that we don't support KVM_IRQ_LINE_STATUS. This also adds support for interrupt ack notifiers to the XICS code so that the IRQFD resampler functionality can be supported. Signed-off-by: Paul Mackerras <paulus@samba.org> Tested-by: Eric Auger <eric.auger@linaro.org> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: PPC: Book3S: Add API for in-kernel XICS emulationPaul Mackerras2013-05-021-0/+1
| | | | | | | | | | | | | | | | | | | This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS callsPaul Mackerras2013-04-261-1/+1
| | | | | | | | | | | | | | This adds support for the ibm,int-on and ibm,int-off RTAS calls to the in-kernel XICS emulation and corrects the handling of the saved priority by the ibm,set-xive RTAS call. With this, ibm,int-off sets the specified interrupt's priority in its saved_priority field and sets the priority to 0xff (the least favoured value). ibm,int-on restores the saved_priority to the priority field, and ibm,set-xive sets both the priority and the saved_priority to the specified priority value. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulationBenjamin Herrenschmidt2013-04-261-0/+16
| | | | | | | | | | | This adds an implementation of the XICS hypercalls in real mode for HV KVM, which allows us to avoid exiting the guest MMU context on all threads for a variety of operations such as fetching a pending interrupt, EOI of messages, IPIs, etc. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controllerBenjamin Herrenschmidt2013-04-261-0/+113
This adds in-kernel emulation of the XICS (eXternal Interrupt Controller Specification) interrupt controller specified by PAPR, for both HV and PR KVM guests. The XICS emulation supports up to 1048560 interrupt sources. Interrupt source numbers below 16 are reserved; 0 is used to mean no interrupt and 2 is used for IPIs. Internally these are represented in blocks of 1024, called ICS (interrupt controller source) entities, but that is not visible to userspace. Each vcpu gets one ICP (interrupt controller presentation) entity, used to store the per-vcpu state such as vcpu priority, pending interrupt state, IPI request, etc. This does not include any API or any way to connect vcpus to their ICP state; that will be added in later patches. This is based on an initial implementation by Michael Ellerman <michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and Paul Mackerras. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix typo, add dependency on !KVM_MPIC] Signed-off-by: Alexander Graf <agraf@suse.de>