From c1fe190c06723322f2dfac31d3b982c581e434ef Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Mon, 1 Apr 2019 17:03:12 +1100 Subject: powerpc: Add force enable of DAWR on P9 option This adds a flag so that the DAWR can be enabled on P9 via: echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous The DAWR was previously force disabled on POWER9 in: 9654153158 powerpc: Disable DAWR in the base POWER9 CPU features Also see Documentation/powerpc/DAWR-POWER9.txt This is a dangerous setting, USE AT YOUR OWN RISK. Some users may not care about a bad user crashing their box (ie. single user/desktop systems) and really want the DAWR. This allows them to force enable DAWR. This flag can also be used to disable DAWR access. Once this is cleared, all DAWR access should be cleared immediately and your machine once again safe from crashing. Userspace may get confused by toggling this. If DAWR is force enabled/disabled between getting the number of breakpoints (via PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an inconsistent view of what's available. Similarly for guests. For the DAWR to be enabled in a KVM guest, the DAWR needs to be force enabled in the host AND the guest. For this reason, this won't work on POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the dawr_enable_dangerous file will fail if the hypervisor doesn't support writing the DAWR. To double check the DAWR is working, run this kernel selftest: tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c Any errors/failures/skips mean something is wrong. Signed-off-by: Michael Neuling Signed-off-by: Michael Ellerman --- Documentation/powerpc/DAWR-POWER9.txt | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) (limited to 'Documentation') diff --git a/Documentation/powerpc/DAWR-POWER9.txt b/Documentation/powerpc/DAWR-POWER9.txt index 2feaa6619658..bdec03650941 100644 --- a/Documentation/powerpc/DAWR-POWER9.txt +++ b/Documentation/powerpc/DAWR-POWER9.txt @@ -56,3 +56,35 @@ POWER9. Loads and stores to the watchpoint locations will not be trapped in GDB. The watchpoint is remembered, so if the guest is migrated back to the POWER8 host, it will start working again. +Force enabling the DAWR +============================= +Kernels (since ~v5.2) have an option to force enable the DAWR via: + + echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous + +This enables the DAWR even on POWER9. + +This is a dangerous setting, USE AT YOUR OWN RISK. + +Some users may not care about a bad user crashing their box +(ie. single user/desktop systems) and really want the DAWR. This +allows them to force enable DAWR. + +This flag can also be used to disable DAWR access. Once this is +cleared, all DAWR access should be cleared immediately and your +machine once again safe from crashing. + +Userspace may get confused by toggling this. If DAWR is force +enabled/disabled between getting the number of breakpoints (via +PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an +inconsistent view of what's available. Similarly for guests. + +For the DAWR to be enabled in a KVM guest, the DAWR needs to be force +enabled in the host AND the guest. For this reason, this won't work on +POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the +dawr_enable_dangerous file will fail if the hypervisor doesn't support +writing the DAWR. + +To double check the DAWR is working, run this kernel selftest: + tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +Any errors/failures/skips mean something is wrong. -- cgit v1.2.3 From 90c73795afa24890bd2ae4f3b359de04b4147d37 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:27 +0200 Subject: KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is the basic framework for the new KVM device supporting the XIVE native exploitation mode. The user interface exposes a new KVM device to be created by QEMU, only available when running on a L0 hypervisor. Support for nested guests is not available yet. The XIVE device reuses the device structure of the XICS-on-XIVE device as they have a lot in common. That could possibly change in the future if the need arise. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 Documentation/virtual/kvm/devices/xive.txt (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt new file mode 100644 index 000000000000..fdbd2ff92a88 --- /dev/null +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -0,0 +1,19 @@ +POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) +========================================================== + +Device types supported: + KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 + +This device acts as a VM interrupt controller. It provides the KVM +interface to configure the interrupt sources of a VM in the underlying +POWER9 XIVE interrupt controller. + +Only one XIVE instance may be instantiated. A guest XIVE device +requires a POWER9 host and the guest OS should have support for the +XIVE native exploitation interrupt mode. If not, it should run using +the legacy interrupt mode, referred as XICS (POWER7/8). + +* Groups: + + 1. KVM_DEV_XIVE_GRP_CTRL + Provides global controls on the device -- cgit v1.2.3 From eacc56bb9de3e6830ddc169553772cd6de59ee4c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:28 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Introduce a new capability KVM_CAP_PPC_IRQ_XIVE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to let QEMU connect the vCPU presenters to the XIVE KVM device if required. The capability is not advertised for now as the full support for the XIVE native exploitation mode is not yet available. When this is case, the capability will be advertised on PowerNV Hypervisors only. Nested guests (pseries KVM Hypervisor) are not supported. Internally, the interface to the new KVM device is protected with a new interrupt mode: KVMPPC_IRQ_XIVE. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 67068c47c591..e38eb17b7be6 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4504,6 +4504,15 @@ struct kvm_sync_regs { struct kvm_vcpu_events events; }; +6.75 KVM_CAP_PPC_IRQ_XIVE + +Architectures: ppc +Target: vcpu +Parameters: args[0] is the XIVE device fd + args[1] is the XIVE CPU number (server ID) for this vcpu + +This capability connects the vcpu to an in-kernel XIVE device. + 7. Capabilities that can be enabled on VMs ------------------------------------------ -- cgit v1.2.3 From 4131f83c3d64e591014dad14c7f8070c538b9422 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:29 +0200 Subject: KVM: PPC: Book3S HV: XIVE: add a control to initialize a source MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The XIVE KVM device maintains a list of interrupt sources for the VM which are allocated in the pool of generic interrupts (IPIs) of the main XIVE IC controller. These are used for the CPU IPIs as well as for virtual device interrupts. The IRQ number space is defined by QEMU. The XIVE device reuses the source structures of the XICS-on-XIVE device for the source blocks (2-level tree) and for the source interrupts. Under XIVE native, the source interrupt caches mostly configuration information and is less used than under the XICS-on-XIVE device in which hcalls are still necessary at run-time. When a source is initialized in KVM, an IPI interrupt source is simply allocated at the OPAL level and then MASKED. KVM only needs to know about its type: LSI or MSI. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index fdbd2ff92a88..cd8bfc37b72e 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -17,3 +17,18 @@ the legacy interrupt mode, referred as XICS (POWER7/8). 1. KVM_DEV_XIVE_GRP_CTRL Provides global controls on the device + + 2. KVM_DEV_XIVE_GRP_SOURCE (write only) + Initializes a new source in the XIVE device and mask it. + Attributes: + Interrupt source number (64-bit) + The kvm_device_attr.addr points to a __u64 value: + bits: | 63 .... 2 | 1 | 0 + values: | unused | level | type + - type: 0:MSI 1:LSI + - level: assertion level in case of an LSI. + Errors: + -E2BIG: Interrupt source number is out of range + -ENOMEM: Could not create a new source block + -EFAULT: Invalid user pointer for attr->addr. + -ENXIO: Could not allocate underlying HW interrupt -- cgit v1.2.3 From e8676ce50e224d507946b1c535bc13584e6b49ff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:30 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a control to configure a source MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This control will be used by the H_INT_SET_SOURCE_CONFIG hcall from QEMU to configure the target of a source and also to restore the configuration of a source when migrating the VM. The XIVE source interrupt structure is extended with the value of the Effective Interrupt Source Number. The EISN is the interrupt number pushed in the event queue that the guest OS will use to dispatch events internally. Caching the EISN value in KVM eases the test when checking if a reconfiguration is indeed needed. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index cd8bfc37b72e..33c64b2cdbe8 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -32,3 +32,24 @@ the legacy interrupt mode, referred as XICS (POWER7/8). -ENOMEM: Could not create a new source block -EFAULT: Invalid user pointer for attr->addr. -ENXIO: Could not allocate underlying HW interrupt + + 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) + Configures source targeting + Attributes: + Interrupt source number (64-bit) + The kvm_device_attr.addr points to a __u64 value: + bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 + values: | eisn | mask | server | priority + - priority: 0-7 interrupt priority level + - server: CPU number chosen to handle the interrupt + - mask: mask flag (unused) + - eisn: Effective Interrupt Source Number + Errors: + -ENOENT: Unknown source number + -EINVAL: Not initialized source number + -EINVAL: Invalid priority + -EINVAL: Invalid CPU number. + -EFAULT: Invalid user pointer for attr->addr. + -ENXIO: CPU event queues not configured or configuration of the + underlying HW interrupt failed + -EBUSY: No CPU available to serve interrupt -- cgit v1.2.3 From 13ce3297c5766b9541b6a7a255794c5168a7ae1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:31 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These controls will be used by the H_INT_SET_QUEUE_CONFIG and H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying Event Queue in the XIVE IC. They will also be used to restore the configuration of the XIVE EQs and to capture the internal run-time state of the EQs. Both 'get' and 'set' rely on an OPAL call to access the EQ toggle bit and EQ index which are updated by the XIVE IC when event notifications are enqueued in the EQ. The value of the guest physical address of the event queue is saved in the XIVE internal xive_q structure for later use. That is when migration needs to mark the EQ pages dirty to capture a consistent memory state of the VM. To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra OPAL call setting the EQ toggle bit and EQ index to configure the EQ, but restoring the EQ state will. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 34 ++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 33c64b2cdbe8..cc13bfd5cf53 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -53,3 +53,37 @@ the legacy interrupt mode, referred as XICS (POWER7/8). -ENXIO: CPU event queues not configured or configuration of the underlying HW interrupt failed -EBUSY: No CPU available to serve interrupt + + 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) + Configures an event queue of a CPU + Attributes: + EQ descriptor identifier (64-bit) + The EQ descriptor identifier is a tuple (server, priority) : + bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 + values: | unused | server | priority + The kvm_device_attr.addr points to : + struct kvm_ppc_xive_eq { + __u32 flags; + __u32 qshift; + __u64 qaddr; + __u32 qtoggle; + __u32 qindex; + __u8 pad[40]; + }; + - flags: queue flags + KVM_XIVE_EQ_ALWAYS_NOTIFY (required) + forces notification without using the coalescing mechanism + provided by the XIVE END ESBs. + - qshift: queue size (power of 2) + - qaddr: real address of queue + - qtoggle: current queue toggle bit + - qindex: current queue index + - pad: reserved for future use + Errors: + -ENOENT: Invalid CPU number + -EINVAL: Invalid priority + -EINVAL: Invalid flags + -EINVAL: Invalid queue size + -EINVAL: Invalid queue address + -EFAULT: Invalid user pointer for attr->addr. + -EIO: Configuration of the underlying HW failed -- cgit v1.2.3 From 5ca806474859a0e94584b3a63f9509a25758408e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:32 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a global reset control MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This control is to be used by the H_INT_RESET hcall from QEMU. Its purpose is to clear all configuration of the sources and EQs. This is necessary in case of a kexec (for a kdump kernel for instance) to make sure that no remaining configuration is left from the previous boot setup so that the new kernel can start safely from a clean state. The queue 7 is ignored when the XIVE device is configured to run in single escalation mode. Prio 7 is used by escalations. The XIVE VP is kept enabled as the vCPU is still active and connected to the XIVE device. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index cc13bfd5cf53..429cbc4cf960 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -17,6 +17,11 @@ the legacy interrupt mode, referred as XICS (POWER7/8). 1. KVM_DEV_XIVE_GRP_CTRL Provides global controls on the device + Attributes: + 1.1 KVM_DEV_XIVE_RESET (write only) + Resets the interrupt controller configuration for sources and event + queues. To be used by kexec and kdump. + Errors: none 2. KVM_DEV_XIVE_GRP_SOURCE (write only) Initializes a new source in the XIVE device and mask it. -- cgit v1.2.3 From 7b46b6169ab80f8f415a0ca2ea4aa7f1afdcc4f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:33 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a control to sync the sources MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This control will be used by the H_INT_SYNC hcall from QEMU to flush event notifications on the XIVE IC owning the source. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 429cbc4cf960..1e7f19d7594b 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -92,3 +92,11 @@ the legacy interrupt mode, referred as XICS (POWER7/8). -EINVAL: Invalid queue address -EFAULT: Invalid user pointer for attr->addr. -EIO: Configuration of the underlying HW failed + + 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) + Synchronize the source to flush event notifications + Attributes: + Interrupt source number (64-bit) + Errors: + -ENOENT: Unknown source number + -EINVAL: Not initialized source number -- cgit v1.2.3 From e6714bd1671da9d8dfb5332075df251b746fd0fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:34 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a control to dirty the XIVE EQ pages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When migration of a VM is initiated, a first copy of the RAM is transferred to the destination before the VM is stopped, but there is no guarantee that the EQ pages in which the event notifications are queued have not been modified. To make sure migration will capture a consistent memory state, the XIVE device should perform a XIVE quiesce sequence to stop the flow of event notifications and stabilize the EQs. This is the purpose of the KVM_DEV_XIVE_EQ_SYNC control which will also marks the EQ pages dirty to force their transfer. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 1e7f19d7594b..7ffd4c7be7b5 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -23,6 +23,12 @@ the legacy interrupt mode, referred as XICS (POWER7/8). queues. To be used by kexec and kdump. Errors: none + 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) + Sync all the sources and queues and mark the EQ pages dirty. This + to make sure that a consistent memory state is captured when + migrating the VM. + Errors: none + 2. KVM_DEV_XIVE_GRP_SOURCE (write only) Initializes a new source in the XIVE device and mask it. Attributes: @@ -100,3 +106,26 @@ the legacy interrupt mode, referred as XICS (POWER7/8). Errors: -ENOENT: Unknown source number -EINVAL: Not initialized source number + +* Migration: + + Saving the state of a VM using the XIVE native exploitation mode + should follow a specific sequence. When the VM is stopped : + + 1. Mask all sources (PQ=01) to stop the flow of events. + + 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to + flush any in-flight event notification and to stabilize the EQs. At + this stage, the EQ pages are marked dirty to make sure they are + transferred in the migration sequence. + + 3. Capture the state of the source targeting, the EQs configuration + and the state of thread interrupt context registers. + + Restore is similar : + + 1. Restore the EQ configuration. As targeting depends on it. + 2. Restore targeting + 3. Restore the thread interrupt contexts + 4. Restore the source states + 5. Let the vCPU run -- cgit v1.2.3 From e4945b9da52b36052b7c509ca31c5ead1d165b24 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:35 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add get/set accessors for the VP XIVE state MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The state of the thread interrupt management registers needs to be collected for migration. These registers are cached under the 'xive_saved_state.w01' field of the VCPU when the VPCU context is pulled from the HW thread. An OPAL call retrieves the backup of the IPB register in the underlying XIVE NVT structure and merges it in the KVM state. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/api.txt | 1 + Documentation/virtual/kvm/devices/xive.txt | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e38eb17b7be6..5b505520a616 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1985,6 +1985,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TLB3PS | 32 PPC | KVM_REG_PPC_EPTCFG | 32 PPC | KVM_REG_PPC_ICP_STATE | 64 + PPC | KVM_REG_PPC_VP_STATE | 128 PPC | KVM_REG_PPC_TB_OFFSET | 64 PPC | KVM_REG_PPC_SPMC1 | 32 PPC | KVM_REG_PPC_SPMC2 | 32 diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 7ffd4c7be7b5..525d1eebcf34 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -107,6 +107,23 @@ the legacy interrupt mode, referred as XICS (POWER7/8). -ENOENT: Unknown source number -EINVAL: Not initialized source number +* VCPU state + + The XIVE IC maintains VP interrupt state in an internal structure + called the NVT. When a VP is not dispatched on a HW processor + thread, this structure can be updated by HW if the VP is the target + of an event notification. + + It is important for migration to capture the cached IPB from the NVT + as it synthesizes the priorities of the pending interrupts. We + capture a bit more to report debug information. + + KVM_REG_PPC_VP_STATE (2 * 64bits) + bits: | 63 .... 32 | 31 .... 0 | + values: | TIMA word0 | TIMA word1 | + bits: | 127 .......... 64 | + values: | unused | + * Migration: Saving the state of a VM using the XIVE native exploitation mode -- cgit v1.2.3 From 39e9af3de5ca936098bc80ebe14401426673c208 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:37 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a TIMA mapping MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Each thread has an associated Thread Interrupt Management context composed of a set of registers. These registers let the thread handle priority management and interrupt acknowledgment. The most important are : - Interrupt Pending Buffer (IPB) - Current Processor Priority (CPPR) - Notification Source Register (NSR) They are exposed to software in four different pages each proposing a view with a different privilege. The first page is for the physical thread context and the second for the hypervisor. Only the third (operating system) and the fourth (user level) are exposed the guest. A custom VM fault handler will populate the VMA with the appropriate pages, which should only be the OS page for now. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 525d1eebcf34..0cd7847ec38a 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -13,6 +13,29 @@ requires a POWER9 host and the guest OS should have support for the XIVE native exploitation interrupt mode. If not, it should run using the legacy interrupt mode, referred as XICS (POWER7/8). +* Device Mappings + + The KVM device exposes different MMIO ranges of the XIVE HW which + are required for interrupt management. These are exposed to the + guest in VMAs populated with a custom VM fault handler. + + 1. Thread Interrupt Management Area (TIMA) + + Each thread has an associated Thread Interrupt Management context + composed of a set of registers. These registers let the thread + handle priority management and interrupt acknowledgment. The most + important are : + + - Interrupt Pending Buffer (IPB) + - Current Processor Priority (CPPR) + - Notification Source Register (NSR) + + They are exposed to software in four different pages each proposing + a view with a different privilege. The first page is for the + physical thread context and the second for the hypervisor. Only the + third (operating system) and the fourth (user level) are exposed the + guest. + * Groups: 1. KVM_DEV_XIVE_GRP_CTRL -- cgit v1.2.3 From 6520ca64cde71b75dae54f3fcb33517a93d82486 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:38 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Each source is associated with an Event State Buffer (ESB) with a even/odd pair of pages which provides commands to manage the source: to trigger, to EOI, to turn off the source for instance. The custom VM fault handler will deduce the guest IRQ number from the offset of the fault, and the ESB page of the associated XIVE interrupt will be inserted into the VMA using the internal structure caching information on the interrupts. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 0cd7847ec38a..69ee62d3d4dc 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -36,6 +36,13 @@ the legacy interrupt mode, referred as XICS (POWER7/8). third (operating system) and the fourth (user level) are exposed the guest. + 2. Event State Buffer (ESB) + + Each source is associated with an Event State Buffer (ESB) with + either a pair of even/odd pair of pages which provides commands to + manage the source: to trigger, to EOI, to turn off the source for + instance. + * Groups: 1. KVM_DEV_XIVE_GRP_CTRL -- cgit v1.2.3 From 232b984b7d55e68971962f07f1dd1d1eb1be52e0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Thu, 18 Apr 2019 12:39:39 +0200 Subject: KVM: PPC: Book3S HV: XIVE: Add passthrough support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The KVM XICS-over-XIVE device and the proposed KVM XIVE native device implement an IRQ space for the guest using the generic IPI interrupts of the XIVE IC controller. These interrupts are allocated at the OPAL level and "mapped" into the guest IRQ number space in the range 0-0x1FFF. Interrupt management is performed in the XIVE way: using loads and stores on the addresses of the XIVE IPI interrupt ESB pages. Both KVM devices share the same internal structure caching information on the interrupts, among which the xive_irq_data struct containing the addresses of the IPI ESB pages and an extra one in case of pass-through. The later contains the addresses of the ESB pages of the underlying HW controller interrupts, PHB4 in all cases for now. A guest, when running in the XICS legacy interrupt mode, lets the KVM XICS-over-XIVE device "handle" interrupt management, that is to perform the loads and stores on the addresses of the ESB pages of the guest interrupts. However, when running in XIVE native exploitation mode, the KVM XIVE native device exposes the interrupt ESB pages to the guest and lets the guest perform directly the loads and stores. The VMA exposing the ESB pages make use of a custom VM fault handler which role is to populate the VMA with appropriate pages. When a fault occurs, the guest IRQ number is deduced from the offset, and the ESB pages of associated XIVE IPI interrupt are inserted in the VMA (using the internal structure caching information on the interrupts). Supporting device passthrough in the guest running in XIVE native exploitation mode adds some extra refinements because the ESB pages of a different HW controller (PHB4) need to be exposed to the guest along with the initial IPI ESB pages of the XIVE IC controller. But the overall mechanic is the same. When the device HW irqs are mapped into or unmapped from the guest IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped(), are called to record or clear the passthrough interrupt information and to perform the switch. The approach taken by this patch is to clear the ESB pages of the guest IRQ number being mapped and let the VM fault handler repopulate. The handler will insert the ESB page corresponding to the HW interrupt of the device being passed-through or the initial IPI ESB page if the device is being removed. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: Paul Mackerras --- Documentation/virtual/kvm/devices/xive.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt index 69ee62d3d4dc..9a24a4525253 100644 --- a/Documentation/virtual/kvm/devices/xive.txt +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -43,6 +43,25 @@ the legacy interrupt mode, referred as XICS (POWER7/8). manage the source: to trigger, to EOI, to turn off the source for instance. + 3. Device pass-through + + When a device is passed-through into the guest, the source + interrupts are from a different HW controller (PHB4) and the ESB + pages exposed to the guest should accommadate this change. + + The passthru_irq helpers, kvmppc_xive_set_mapped() and + kvmppc_xive_clr_mapped() are called when the device HW irqs are + mapped into or unmapped from the guest IRQ number space. The KVM + device extends these helpers to clear the ESB pages of the guest IRQ + number being mapped and then lets the VM fault handler repopulate. + The handler will insert the ESB page corresponding to the HW + interrupt of the device being passed-through or the initial IPI ESB + page if the device has being removed. + + The ESB remapping is fully transparent to the guest and the OS + device driver. All handling is done within VFIO and the above + helpers in KVM-PPC. + * Groups: 1. KVM_DEV_XIVE_GRP_CTRL -- cgit v1.2.3