summaryrefslogtreecommitdiffstats
path: root/Documentation/virt/hyperv/vmbus.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/virt/hyperv/vmbus.rst')
-rw-r--r--Documentation/virt/hyperv/vmbus.rst143
1 files changed, 83 insertions, 60 deletions
diff --git a/Documentation/virt/hyperv/vmbus.rst b/Documentation/virt/hyperv/vmbus.rst
index d2012d9022c5..1dcef6a7fda3 100644
--- a/Documentation/virt/hyperv/vmbus.rst
+++ b/Documentation/virt/hyperv/vmbus.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
-VMbus
+VMBus
=====
-VMbus is a software construct provided by Hyper-V to guest VMs. It
+VMBus is a software construct provided by Hyper-V to guest VMs. It
consists of a control path and common facilities used by synthetic
devices that Hyper-V presents to guest VMs. The control path is
used to offer synthetic devices to the guest VM and, in some cases,
@@ -12,9 +12,9 @@ and the synthetic device implementation that is part of Hyper-V, and
signaling primitives to allow Hyper-V and the guest to interrupt
each other.
-VMbus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
-entry in a running Linux guest. The VMbus driver (drivers/hv/vmbus_drv.c)
-establishes the VMbus control path with the Hyper-V host, then
+VMBus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
+entry in a running Linux guest. The VMBus driver (drivers/hv/vmbus_drv.c)
+establishes the VMBus control path with the Hyper-V host, then
registers itself as a Linux bus driver. It implements the standard
bus functions for adding and removing devices to/from the bus.
@@ -49,9 +49,9 @@ synthetic NIC is referred to as "netvsc" and the Linux driver for
the synthetic SCSI controller is "storvsc". These drivers contain
functions with names like "storvsc_connect_to_vsp".
-VMbus channels
+VMBus channels
--------------
-An instance of a synthetic device uses VMbus channels to communicate
+An instance of a synthetic device uses VMBus channels to communicate
between the VSP and the VSC. Channels are bi-directional and used
for passing messages. Most synthetic devices use a single channel,
but the synthetic SCSI controller and synthetic NIC may use multiple
@@ -73,7 +73,7 @@ write indices and some control flags, followed by the memory for the
actual ring. The size of the ring is determined by the VSC in the
guest and is specific to each synthetic device. The list of GPAs
making up the ring is communicated to the Hyper-V host over the
-VMbus control path as a GPA Descriptor List (GPADL). See function
+VMBus control path as a GPA Descriptor List (GPADL). See function
vmbus_establish_gpadl().
Each ring buffer is mapped into contiguous Linux kernel virtual
@@ -102,10 +102,10 @@ resources. For Windows Server 2019 and later, this limit is
approximately 1280 Mbytes. For versions prior to Windows Server
2019, the limit is approximately 384 Mbytes.
-VMbus messages
---------------
-All VMbus messages have a standard header that includes the message
-length, the offset of the message payload, some flags, and a
+VMBus channel messages
+----------------------
+All messages sent in a VMBus channel have a standard header that includes
+the message length, the offset of the message payload, some flags, and a
transactionID. The portion of the message after the header is
unique to each VSP/VSC pair.
@@ -137,7 +137,7 @@ control message contains a list of GPAs that describe the data
buffer. For example, the storvsc driver uses this approach to
specify the data buffers to/from which disk I/O is done.
-Three functions exist to send VMbus messages:
+Three functions exist to send VMBus channel messages:
1. vmbus_sendpacket(): Control-only messages and messages with
embedded data -- no GPAs
@@ -154,20 +154,51 @@ Historically, Linux guests have trusted Hyper-V to send well-formed
and valid messages, and Linux drivers for synthetic devices did not
fully validate messages. With the introduction of processor
technologies that fully encrypt guest memory and that allow the
-guest to not trust the hypervisor (AMD SNP-SEV, Intel TDX), trusting
+guest to not trust the hypervisor (AMD SEV-SNP, Intel TDX), trusting
the Hyper-V host is no longer a valid assumption. The drivers for
-VMbus synthetic devices are being updated to fully validate any
+VMBus synthetic devices are being updated to fully validate any
values read from memory that is shared with Hyper-V, which includes
-messages from VMbus devices. To facilitate such validation,
+messages from VMBus devices. To facilitate such validation,
messages read by the guest from the "in" ring buffer are copied to a
temporary buffer that is not shared with Hyper-V. Validation is
performed in this temporary buffer without the risk of Hyper-V
maliciously modifying the message after it is validated but before
it is used.
-VMbus interrupts
+Synthetic Interrupt Controller (synic)
+--------------------------------------
+Hyper-V provides each guest CPU with a synthetic interrupt controller
+that is used by VMBus for host-guest communication. While each synic
+defines 16 synthetic interrupts (SINT), Linux uses only one of the 16
+(VMBUS_MESSAGE_SINT). All interrupts related to communication between
+the Hyper-V host and a guest CPU use that SINT.
+
+The SINT is mapped to a single per-CPU architectural interrupt (i.e,
+an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
+each CPU in the guest has a synic and may receive VMBus interrupts,
+they are best modeled in Linux as per-CPU interrupts. This model works
+well on arm64 where a single per-CPU Linux IRQ is allocated for
+VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
+"Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
+interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
+across all CPUs and explicitly coded to call vmbus_isr(). In this case,
+there's no Linux IRQ, and the interrupts are visible in aggregate in
+/proc/interrupts on the "HYP" line.
+
+The synic provides the means to demultiplex the architectural interrupt into
+one or more logical interrupts and route the logical interrupt to the proper
+VMBus handler in Linux. This demultiplexing is done by vmbus_isr() and
+related functions that access synic data structures.
+
+The synic is not modeled in Linux as an irq chip or irq domain,
+and the demultiplexed logical interrupts are not Linux IRQs. As such,
+they don't appear in /proc/interrupts or /proc/irq. The CPU
+affinity for one of these logical interrupts is controlled via an
+entry under /sys/bus/vmbus as described below.
+
+VMBus interrupts
----------------
-VMbus provides a mechanism for the guest to interrupt the host when
+VMBus provides a mechanism for the guest to interrupt the host when
the guest has queued new messages in a ring buffer. The host
expects that the guest will send an interrupt only when an "out"
ring buffer transitions from empty to non-empty. If the guest sends
@@ -176,63 +207,55 @@ unnecessary. If a guest sends an excessive number of unnecessary
interrupts, the host may throttle that guest by suspending its
execution for a few seconds to prevent a denial-of-service attack.
-Similarly, the host will interrupt the guest when it sends a new
-message on the VMbus control path, or when a VMbus channel "in" ring
-buffer transitions from empty to non-empty. Each CPU in the guest
-may receive VMbus interrupts, so they are best modeled as per-CPU
-interrupts in Linux. This model works well on arm64 where a single
-per-CPU IRQ is allocated for VMbus. Since x86/x64 lacks support for
-per-CPU IRQs, an x86 interrupt vector is statically allocated (see
-HYPERVISOR_CALLBACK_VECTOR) across all CPUs and explicitly coded to
-call the VMbus interrupt service routine. These interrupts are
-visible in /proc/interrupts on the "HYP" line.
-
-The guest CPU that a VMbus channel will interrupt is selected by the
+Similarly, the host will interrupt the guest via the synic when
+it sends a new message on the VMBus control path, or when a VMBus
+channel "in" ring buffer transitions from empty to non-empty due to
+the host inserting a new VMBus channel message. The control message stream
+and each VMBus channel "in" ring buffer are separate logical interrupts
+that are demultiplexed by vmbus_isr(). It demultiplexes by first checking
+for channel interrupts by calling vmbus_chan_sched(), which looks at a synic
+bitmap to determine which channels have pending interrupts on this CPU.
+If multiple channels have pending interrupts for this CPU, they are
+processed sequentially. When all channel interrupts have been processed,
+vmbus_isr() checks for and processes any messages received on the VMBus
+control path.
+
+The guest CPU that a VMBus channel will interrupt is selected by the
guest when the channel is created, and the host is informed of that
-selection. VMbus devices are broadly grouped into two categories:
+selection. VMBus devices are broadly grouped into two categories:
-1. "Slow" devices that need only one VMbus channel. The devices
+1. "Slow" devices that need only one VMBus channel. The devices
(such as keyboard, mouse, heartbeat, and timesync) generate
- relatively few interrupts. Their VMbus channels are all
+ relatively few interrupts. Their VMBus channels are all
assigned to interrupt the VMBUS_CONNECT_CPU, which is always
CPU 0.
-2. "High speed" devices that may use multiple VMbus channels for
+2. "High speed" devices that may use multiple VMBus channels for
higher parallelism and performance. These devices include the
- synthetic SCSI controller and synthetic NIC. Their VMbus
+ synthetic SCSI controller and synthetic NIC. Their VMBus
channels interrupts are assigned to CPUs that are spread out
among the available CPUs in the VM so that interrupts on
multiple channels can be processed in parallel.
-The assignment of VMbus channel interrupts to CPUs is done in the
+The assignment of VMBus channel interrupts to CPUs is done in the
function init_vp_index(). This assignment is done outside of the
normal Linux interrupt affinity mechanism, so the interrupts are
neither "unmanaged" nor "managed" interrupts.
-The CPU that a VMbus channel will interrupt can be seen in
+The CPU that a VMBus channel will interrupt can be seen in
/sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
When running on later versions of Hyper-V, the CPU can be changed
-by writing a new value to this sysfs entry. Because the interrupt
-assignment is done outside of the normal Linux affinity mechanism,
-there are no entries in /proc/irq corresponding to individual
-VMbus channel interrupts.
+by writing a new value to this sysfs entry. Because VMBus channel
+interrupts are not Linux IRQs, there are no entries in /proc/interrupts
+or /proc/irq corresponding to individual VMBus channel interrupts.
An online CPU in a Linux guest may not be taken offline if it has
-VMbus channel interrupts assigned to it. Any such channel
+VMBus channel interrupts assigned to it. Any such channel
interrupts must first be manually reassigned to another CPU as
described above. When no channel interrupts are assigned to the
CPU, it can be taken offline.
-When a guest CPU receives a VMbus interrupt from the host, the
-function vmbus_isr() handles the interrupt. It first checks for
-channel interrupts by calling vmbus_chan_sched(), which looks at a
-bitmap setup by the host to determine which channels have pending
-interrupts on this CPU. If multiple channels have pending
-interrupts for this CPU, they are processed sequentially. When all
-channel interrupts have been processed, vmbus_isr() checks for and
-processes any message received on the VMbus control path.
-
-The VMbus channel interrupt handling code is designed to work
+The VMBus channel interrupt handling code is designed to work
correctly even if an interrupt is received on a CPU other than the
CPU assigned to the channel. Specifically, the code does not use
CPU-based exclusion for correctness. In normal operation, Hyper-V
@@ -242,23 +265,23 @@ when Hyper-V will make the transition. The code must work correctly
even if there is a time lag before Hyper-V starts interrupting the
new CPU. See comments in target_cpu_store().
-VMbus device creation/deletion
+VMBus device creation/deletion
------------------------------
Hyper-V and the Linux guest have a separate message-passing path
that is used for synthetic device creation and deletion. This
-path does not use a VMbus channel. See vmbus_post_msg() and
+path does not use a VMBus channel. See vmbus_post_msg() and
vmbus_on_msg_dpc().
The first step is for the guest to connect to the generic
-Hyper-V VMbus mechanism. As part of establishing this connection,
-the guest and Hyper-V agree on a VMbus protocol version they will
+Hyper-V VMBus mechanism. As part of establishing this connection,
+the guest and Hyper-V agree on a VMBus protocol version they will
use. This negotiation allows newer Linux kernels to run on older
Hyper-V versions, and vice versa.
The guest then tells Hyper-V to "send offers". Hyper-V sends an
offer message to the guest for each synthetic device that the VM
-is configured to have. Each VMbus device type has a fixed GUID
-known as the "class ID", and each VMbus device instance is also
+is configured to have. Each VMBus device type has a fixed GUID
+known as the "class ID", and each VMBus device instance is also
identified by a GUID. The offer message from Hyper-V contains
both GUIDs to uniquely (within the VM) identify the device.
There is one offer message for each device instance, so a VM with
@@ -275,7 +298,7 @@ type based on the class ID, and invokes the correct driver to set up
the device. Driver/device matching is performed using the standard
Linux mechanism.
-The device driver probe function opens the primary VMbus channel to
+The device driver probe function opens the primary VMBus channel to
the corresponding VSP. It allocates guest memory for the channel
ring buffers and shares the ring buffer with the Hyper-V host by
giving the host a list of GPAs for the ring buffer memory. See
@@ -285,7 +308,7 @@ Once the ring buffer is set up, the device driver and VSP exchange
setup messages via the primary channel. These messages may include
negotiating the device protocol version to be used between the Linux
VSC and the VSP on the Hyper-V host. The setup messages may also
-include creating additional VMbus channels, which are somewhat
+include creating additional VMBus channels, which are somewhat
mis-named as "sub-channels" since they are functionally
equivalent to the primary channel once they are created.