summaryrefslogtreecommitdiffstats
path: root/Documentation/s390/vfio-ccw.rst
diff options
context:
space:
mode:
authorCosta Shulyupin <costa.shul@redhat.com>2023-07-18 07:55:02 +0300
committerHeiko Carstens <hca@linux.ibm.com>2023-07-24 12:12:24 +0200
commit37002bc6b6039e1491140869c6801e0a2deee43e (patch)
treebaeb304521b33d4f36bfecb1a03afec2c7af9d93 /Documentation/s390/vfio-ccw.rst
parente3123dfb5373939d65ac2b874189a773d37ac7f5 (diff)
downloadlinux-37002bc6b6039e1491140869c6801e0a2deee43e.tar.gz
linux-37002bc6b6039e1491140869c6801e0a2deee43e.tar.bz2
linux-37002bc6b6039e1491140869c6801e0a2deee43e.zip
docs: move s390 under arch
and fix all in-tree references. Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation directory and making the docs hierarchy more closely match the source hierarchy. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230718045550.495428-1-costa.shul@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Diffstat (limited to 'Documentation/s390/vfio-ccw.rst')
-rw-r--r--Documentation/s390/vfio-ccw.rst445
1 files changed, 0 insertions, 445 deletions
diff --git a/Documentation/s390/vfio-ccw.rst b/Documentation/s390/vfio-ccw.rst
deleted file mode 100644
index 37026fa18179..000000000000
--- a/Documentation/s390/vfio-ccw.rst
+++ /dev/null
@@ -1,445 +0,0 @@
-==================================
-vfio-ccw: the basic infrastructure
-==================================
-
-Introduction
-------------
-
-Here we describe the vfio support for I/O subchannel devices for
-Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
-virtual machine, while vfio is the means.
-
-Different than other hardware architectures, s390 has defined a unified
-I/O access method, which is so called Channel I/O. It has its own access
-patterns:
-
-- Channel programs run asynchronously on a separate (co)processor.
-- The channel subsystem will access any memory designated by the caller
- in the channel program directly, i.e. there is no iommu involved.
-
-Thus when we introduce vfio support for these devices, we realize it
-with a mediated device (mdev) implementation. The vfio mdev will be
-added to an iommu group, so as to make itself able to be managed by the
-vfio framework. And we add read/write callbacks for special vfio I/O
-regions to pass the channel programs from the mdev to its parent device
-(the real I/O subchannel device) to do further address translation and
-to perform I/O instructions.
-
-This document does not intend to explain the s390 I/O architecture in
-every detail. More information/reference could be found here:
-
-- A good start to know Channel I/O in general:
- https://en.wikipedia.org/wiki/Channel_I/O
-- s390 architecture:
- s390 Principles of Operation manual (IBM Form. No. SA22-7832)
-- The existing QEMU code which implements a simple emulated channel
- subsystem could also be a good reference. It makes it easier to follow
- the flow.
- qemu/hw/s390x/css.c
-
-For vfio mediated device framework:
-- Documentation/driver-api/vfio-mediated-device.rst
-
-Motivation of vfio-ccw
-----------------------
-
-Typically, a guest virtualized via QEMU/KVM on s390 only sees
-paravirtualized virtio devices via the "Virtio Over Channel I/O
-(virtio-ccw)" transport. This makes virtio devices discoverable via
-standard operating system algorithms for handling channel devices.
-
-However this is not enough. On s390 for the majority of devices, which
-use the standard Channel I/O based mechanism, we also need to provide
-the functionality of passing through them to a QEMU virtual machine.
-This includes devices that don't have a virtio counterpart (e.g. tape
-drives) or that have specific characteristics which guests want to
-exploit.
-
-For passing a device to a guest, we want to use the same interface as
-everybody else, namely vfio. We implement this vfio support for channel
-devices via the vfio mediated device framework and the subchannel device
-driver "vfio_ccw".
-
-Access patterns of CCW devices
-------------------------------
-
-s390 architecture has implemented a so called channel subsystem, that
-provides a unified view of the devices physically attached to the
-systems. Though the s390 hardware platform knows about a huge variety of
-different peripheral attachments like disk devices (aka. DASDs), tapes,
-communication controllers, etc. They can all be accessed by a well
-defined access method and they are presenting I/O completion a unified
-way: I/O interruptions.
-
-All I/O requires the use of channel command words (CCWs). A CCW is an
-instruction to a specialized I/O channel processor. A channel program is
-a sequence of CCWs which are executed by the I/O channel subsystem. To
-issue a channel program to the channel subsystem, it is required to
-build an operation request block (ORB), which can be used to point out
-the format of the CCW and other control information to the system. The
-operating system signals the I/O channel subsystem to begin executing
-the channel program with a SSCH (start sub-channel) instruction. The
-central processor is then free to proceed with non-I/O instructions
-until interrupted. The I/O completion result is received by the
-interrupt handler in the form of interrupt response block (IRB).
-
-Back to vfio-ccw, in short:
-
-- ORBs and channel programs are built in guest kernel (with guest
- physical addresses).
-- ORBs and channel programs are passed to the host kernel.
-- Host kernel translates the guest physical addresses to real addresses
- and starts the I/O with issuing a privileged Channel I/O instruction
- (e.g SSCH).
-- channel programs run asynchronously on a separate processor.
-- I/O completion will be signaled to the host with I/O interruptions.
- And it will be copied as IRB to user space to pass it back to the
- guest.
-
-Physical vfio ccw device and its child mdev
--------------------------------------------
-
-As mentioned above, we realize vfio-ccw with a mdev implementation.
-
-Channel I/O does not have IOMMU hardware support, so the physical
-vfio-ccw device does not have an IOMMU level translation or isolation.
-
-Subchannel I/O instructions are all privileged instructions. When
-handling the I/O instruction interception, vfio-ccw has the software
-policing and translation how the channel program is programmed before
-it gets sent to hardware.
-
-Within this implementation, we have two drivers for two types of
-devices:
-
-- The vfio_ccw driver for the physical subchannel device.
- This is an I/O subchannel driver for the real subchannel device. It
- realizes a group of callbacks and registers to the mdev framework as a
- parent (physical) device. As a consequence, mdev provides vfio_ccw a
- generic interface (sysfs) to create mdev devices. A vfio mdev could be
- created by vfio_ccw then and added to the mediated bus. It is the vfio
- device that added to an IOMMU group and a vfio group.
- vfio_ccw also provides an I/O region to accept channel program
- request from user space and store I/O interrupt result for user
- space to retrieve. To notify user space an I/O completion, it offers
- an interface to setup an eventfd fd for asynchronous signaling.
-
-- The vfio_mdev driver for the mediated vfio ccw device.
- This is provided by the mdev framework. It is a vfio device driver for
- the mdev that created by vfio_ccw.
- It realizes a group of vfio device driver callbacks, adds itself to a
- vfio group, and registers itself to the mdev framework as a mdev
- driver.
- It uses a vfio iommu backend that uses the existing map and unmap
- ioctls, but rather than programming them into an IOMMU for a device,
- it simply stores the translations for use by later requests. This
- means that a device programmed in a VM with guest physical addresses
- can have the vfio kernel convert that address to process virtual
- address, pin the page and program the hardware with the host physical
- address in one step.
- For a mdev, the vfio iommu backend will not pin the pages during the
- VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
- of the iova<->vaddr mappings in this operation. And they export a
- vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
- backend for the physical devices to pin and unpin pages by demand.
-
-Below is a high Level block diagram::
-
- +-------------+
- | |
- | +---------+ | mdev_register_driver() +--------------+
- | | Mdev | +<-----------------------+ |
- | | bus | | | vfio_mdev.ko |
- | | driver | +----------------------->+ |<-> VFIO user
- | +---------+ | probe()/remove() +--------------+ APIs
- | |
- | MDEV CORE |
- | MODULE |
- | mdev.ko |
- | +---------+ | mdev_register_parent() +--------------+
- | |Physical | +<-----------------------+ |
- | | device | | | vfio_ccw.ko |<-> subchannel
- | |interface| +----------------------->+ | device
- | +---------+ | callback +--------------+
- +-------------+
-
-The process of how these work together.
-
-1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
- physical device (with callbacks) to mdev framework.
- When vfio_ccw probing the subchannel device, it registers device
- pointer and callbacks to the mdev framework. Mdev related file nodes
- under the device node in sysfs would be created for the subchannel
- device, namely 'mdev_create', 'mdev_destroy' and
- 'mdev_supported_types'.
-2. Create a mediated vfio ccw device.
- Use the 'mdev_create' sysfs file, we need to manually create one (and
- only one for our case) mediated device.
-3. vfio_mdev.ko drives the mediated ccw device.
- vfio_mdev is also the vfio device driver. It will probe the mdev and
- add it to an iommu_group and a vfio_group. Then we could pass through
- the mdev to a guest.
-
-
-VFIO-CCW Regions
-----------------
-
-The vfio-ccw driver exposes MMIO regions to accept requests from and return
-results to userspace.
-
-vfio-ccw I/O region
--------------------
-
-An I/O region is used to accept channel program request from user
-space and store I/O interrupt result for user space to retrieve. The
-definition of the region is::
-
- struct ccw_io_region {
- #define ORB_AREA_SIZE 12
- __u8 orb_area[ORB_AREA_SIZE];
- #define SCSW_AREA_SIZE 12
- __u8 scsw_area[SCSW_AREA_SIZE];
- #define IRB_AREA_SIZE 96
- __u8 irb_area[IRB_AREA_SIZE];
- __u32 ret_code;
- } __packed;
-
-This region is always available.
-
-While starting an I/O request, orb_area should be filled with the
-guest ORB, and scsw_area should be filled with the SCSW of the Virtual
-Subchannel.
-
-irb_area stores the I/O result.
-
-ret_code stores a return code for each access of the region. The following
-values may occur:
-
-``0``
- The operation was successful.
-
-``-EOPNOTSUPP``
- The ORB specified transport mode or the
- SCSW specified a function other than the start function.
-
-``-EIO``
- A request was issued while the device was not in a state ready to accept
- requests, or an internal error occurred.
-
-``-EBUSY``
- The subchannel was status pending or busy, or a request is already active.
-
-``-EAGAIN``
- A request was being processed, and the caller should retry.
-
-``-EACCES``
- The channel path(s) used for the I/O were found to be not operational.
-
-``-ENODEV``
- The device was found to be not operational.
-
-``-EINVAL``
- The orb specified a chain longer than 255 ccws, or an internal error
- occurred.
-
-
-vfio-ccw cmd region
--------------------
-
-The vfio-ccw cmd region is used to accept asynchronous instructions
-from userspace::
-
- #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
- #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
- struct ccw_cmd_region {
- __u32 command;
- __u32 ret_code;
- } __packed;
-
-This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD.
-
-Currently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region.
-
-command specifies the command to be issued; ret_code stores a return code
-for each access of the region. The following values may occur:
-
-``0``
- The operation was successful.
-
-``-ENODEV``
- The device was found to be not operational.
-
-``-EINVAL``
- A command other than halt or clear was specified.
-
-``-EIO``
- A request was issued while the device was not in a state ready to accept
- requests.
-
-``-EAGAIN``
- A request was being processed, and the caller should retry.
-
-``-EBUSY``
- The subchannel was status pending or busy while processing a halt request.
-
-vfio-ccw schib region
----------------------
-
-The vfio-ccw schib region is used to return Subchannel-Information
-Block (SCHIB) data to userspace::
-
- struct ccw_schib_region {
- #define SCHIB_AREA_SIZE 52
- __u8 schib_area[SCHIB_AREA_SIZE];
- } __packed;
-
-This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_SCHIB.
-
-Reading this region triggers a STORE SUBCHANNEL to be issued to the
-associated hardware.
-
-vfio-ccw crw region
----------------------
-
-The vfio-ccw crw region is used to return Channel Report Word (CRW)
-data to userspace::
-
- struct ccw_crw_region {
- __u32 crw;
- __u32 pad;
- } __packed;
-
-This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_CRW.
-
-Reading this region returns a CRW if one that is relevant for this
-subchannel (e.g. one reporting changes in channel path state) is
-pending, or all zeroes if not. If multiple CRWs are pending (including
-possibly chained CRWs), reading this region again will return the next
-one, until no more CRWs are pending and zeroes are returned. This is
-similar to how STORE CHANNEL REPORT WORD works.
-
-vfio-ccw operation details
---------------------------
-
-vfio-ccw follows what vfio-pci did on the s390 platform and uses
-vfio-iommu-type1 as the vfio iommu backend.
-
-* CCW translation APIs
- A group of APIs (start with `cp_`) to do CCW translation. The CCWs
- passed in by a user space program are organized with their guest
- physical memory addresses. These APIs will copy the CCWs into kernel
- space, and assemble a runnable kernel channel program by updating the
- guest physical addresses with their corresponding host physical addresses.
- Note that we have to use IDALs even for direct-access CCWs, as the
- referenced memory can be located anywhere, including above 2G.
-
-* vfio_ccw device driver
- This driver utilizes the CCW translation APIs and introduces
- vfio_ccw, which is the driver for the I/O subchannel devices you want
- to pass through.
- vfio_ccw implements the following vfio ioctls::
-
- VFIO_DEVICE_GET_INFO
- VFIO_DEVICE_GET_IRQ_INFO
- VFIO_DEVICE_GET_REGION_INFO
- VFIO_DEVICE_RESET
- VFIO_DEVICE_SET_IRQS
-
- This provides an I/O region, so that the user space program can pass a
- channel program to the kernel, to do further CCW translation before
- issuing them to a real device.
- This also provides the SET_IRQ ioctl to setup an event notifier to
- notify the user space program the I/O completion in an asynchronous
- way.
-
-The use of vfio-ccw is not limited to QEMU, while QEMU is definitely a
-good example to get understand how these patches work. Here is a little
-bit more detail how an I/O request triggered by the QEMU guest will be
-handled (without error handling).
-
-Explanation:
-
-- Q1-Q7: QEMU side process.
-- K1-K5: Kernel side process.
-
-Q1.
- Get I/O region info during initialization.
-
-Q2.
- Setup event notifier and handler to handle I/O completion.
-
-... ...
-
-Q3.
- Intercept a ssch instruction.
-Q4.
- Write the guest channel program and ORB to the I/O region.
-
- K1.
- Copy from guest to kernel.
- K2.
- Translate the guest channel program to a host kernel space
- channel program, which becomes runnable for a real device.
- K3.
- With the necessary information contained in the orb passed in
- by QEMU, issue the ccwchain to the device.
- K4.
- Return the ssch CC code.
-Q5.
- Return the CC code to the guest.
-
-... ...
-
- K5.
- Interrupt handler gets the I/O result and write the result to
- the I/O region.
- K6.
- Signal QEMU to retrieve the result.
-
-Q6.
- Get the signal and event handler reads out the result from the I/O
- region.
-Q7.
- Update the irb for the guest.
-
-Limitations
------------
-
-The current vfio-ccw implementation focuses on supporting basic commands
-needed to implement block device functionality (read/write) of DASD/ECKD
-device only. Some commands may need special handling in the future, for
-example, anything related to path grouping.
-
-DASD is a kind of storage device. While ECKD is a data recording format.
-More information for DASD and ECKD could be found here:
-https://en.wikipedia.org/wiki/Direct-access_storage_device
-https://en.wikipedia.org/wiki/Count_key_data
-
-Together with the corresponding work in QEMU, we can bring the passed
-through DASD/ECKD device online in a guest now and use it as a block
-device.
-
-The current code allows the guest to start channel programs via
-START SUBCHANNEL, and to issue HALT SUBCHANNEL, CLEAR SUBCHANNEL,
-and STORE SUBCHANNEL.
-
-Currently all channel programs are prefetched, regardless of the
-p-bit setting in the ORB. As a result, self modifying channel
-programs are not supported. For this reason, IPL has to be handled as
-a special case by a userspace/guest program; this has been implemented
-in QEMU's s390-ccw bios as of QEMU 4.1.
-
-vfio-ccw supports classic (command mode) channel I/O only. Transport
-mode (HPF) is not supported.
-
-QDIO subchannels are currently not supported. Classic devices other than
-DASD/ECKD might work, but have not been tested.
-
-Reference
----------
-1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
-2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
-3. https://en.wikipedia.org/wiki/Channel_I/O
-4. Documentation/s390/cds.rst
-5. Documentation/driver-api/vfio.rst
-6. Documentation/driver-api/vfio-mediated-device.rst