diff options
author | Costa Shulyupin <costa.shul@redhat.com> | 2023-07-18 07:55:02 +0300 |
---|---|---|
committer | Heiko Carstens <hca@linux.ibm.com> | 2023-07-24 12:12:24 +0200 |
commit | 37002bc6b6039e1491140869c6801e0a2deee43e (patch) | |
tree | baeb304521b33d4f36bfecb1a03afec2c7af9d93 /Documentation/s390/vfio-ccw.rst | |
parent | e3123dfb5373939d65ac2b874189a773d37ac7f5 (diff) | |
download | linux-37002bc6b6039e1491140869c6801e0a2deee43e.tar.gz linux-37002bc6b6039e1491140869c6801e0a2deee43e.tar.bz2 linux-37002bc6b6039e1491140869c6801e0a2deee43e.zip |
docs: move s390 under arch
and fix all in-tree references.
Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230718045550.495428-1-costa.shul@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Diffstat (limited to 'Documentation/s390/vfio-ccw.rst')
-rw-r--r-- | Documentation/s390/vfio-ccw.rst | 445 |
1 files changed, 0 insertions, 445 deletions
diff --git a/Documentation/s390/vfio-ccw.rst b/Documentation/s390/vfio-ccw.rst deleted file mode 100644 index 37026fa18179..000000000000 --- a/Documentation/s390/vfio-ccw.rst +++ /dev/null @@ -1,445 +0,0 @@ -================================== -vfio-ccw: the basic infrastructure -================================== - -Introduction ------------- - -Here we describe the vfio support for I/O subchannel devices for -Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a -virtual machine, while vfio is the means. - -Different than other hardware architectures, s390 has defined a unified -I/O access method, which is so called Channel I/O. It has its own access -patterns: - -- Channel programs run asynchronously on a separate (co)processor. -- The channel subsystem will access any memory designated by the caller - in the channel program directly, i.e. there is no iommu involved. - -Thus when we introduce vfio support for these devices, we realize it -with a mediated device (mdev) implementation. The vfio mdev will be -added to an iommu group, so as to make itself able to be managed by the -vfio framework. And we add read/write callbacks for special vfio I/O -regions to pass the channel programs from the mdev to its parent device -(the real I/O subchannel device) to do further address translation and -to perform I/O instructions. - -This document does not intend to explain the s390 I/O architecture in -every detail. More information/reference could be found here: - -- A good start to know Channel I/O in general: - https://en.wikipedia.org/wiki/Channel_I/O -- s390 architecture: - s390 Principles of Operation manual (IBM Form. No. SA22-7832) -- The existing QEMU code which implements a simple emulated channel - subsystem could also be a good reference. It makes it easier to follow - the flow. - qemu/hw/s390x/css.c - -For vfio mediated device framework: -- Documentation/driver-api/vfio-mediated-device.rst - -Motivation of vfio-ccw ----------------------- - -Typically, a guest virtualized via QEMU/KVM on s390 only sees -paravirtualized virtio devices via the "Virtio Over Channel I/O -(virtio-ccw)" transport. This makes virtio devices discoverable via -standard operating system algorithms for handling channel devices. - -However this is not enough. On s390 for the majority of devices, which -use the standard Channel I/O based mechanism, we also need to provide -the functionality of passing through them to a QEMU virtual machine. -This includes devices that don't have a virtio counterpart (e.g. tape -drives) or that have specific characteristics which guests want to -exploit. - -For passing a device to a guest, we want to use the same interface as -everybody else, namely vfio. We implement this vfio support for channel -devices via the vfio mediated device framework and the subchannel device -driver "vfio_ccw". - -Access patterns of CCW devices ------------------------------- - -s390 architecture has implemented a so called channel subsystem, that -provides a unified view of the devices physically attached to the -systems. Though the s390 hardware platform knows about a huge variety of -different peripheral attachments like disk devices (aka. DASDs), tapes, -communication controllers, etc. They can all be accessed by a well -defined access method and they are presenting I/O completion a unified -way: I/O interruptions. - -All I/O requires the use of channel command words (CCWs). A CCW is an -instruction to a specialized I/O channel processor. A channel program is -a sequence of CCWs which are executed by the I/O channel subsystem. To -issue a channel program to the channel subsystem, it is required to -build an operation request block (ORB), which can be used to point out -the format of the CCW and other control information to the system. The -operating system signals the I/O channel subsystem to begin executing -the channel program with a SSCH (start sub-channel) instruction. The -central processor is then free to proceed with non-I/O instructions -until interrupted. The I/O completion result is received by the -interrupt handler in the form of interrupt response block (IRB). - -Back to vfio-ccw, in short: - -- ORBs and channel programs are built in guest kernel (with guest - physical addresses). -- ORBs and channel programs are passed to the host kernel. -- Host kernel translates the guest physical addresses to real addresses - and starts the I/O with issuing a privileged Channel I/O instruction - (e.g SSCH). -- channel programs run asynchronously on a separate processor. -- I/O completion will be signaled to the host with I/O interruptions. - And it will be copied as IRB to user space to pass it back to the - guest. - -Physical vfio ccw device and its child mdev -------------------------------------------- - -As mentioned above, we realize vfio-ccw with a mdev implementation. - -Channel I/O does not have IOMMU hardware support, so the physical -vfio-ccw device does not have an IOMMU level translation or isolation. - -Subchannel I/O instructions are all privileged instructions. When -handling the I/O instruction interception, vfio-ccw has the software -policing and translation how the channel program is programmed before -it gets sent to hardware. - -Within this implementation, we have two drivers for two types of -devices: - -- The vfio_ccw driver for the physical subchannel device. - This is an I/O subchannel driver for the real subchannel device. It - realizes a group of callbacks and registers to the mdev framework as a - parent (physical) device. As a consequence, mdev provides vfio_ccw a - generic interface (sysfs) to create mdev devices. A vfio mdev could be - created by vfio_ccw then and added to the mediated bus. It is the vfio - device that added to an IOMMU group and a vfio group. - vfio_ccw also provides an I/O region to accept channel program - request from user space and store I/O interrupt result for user - space to retrieve. To notify user space an I/O completion, it offers - an interface to setup an eventfd fd for asynchronous signaling. - -- The vfio_mdev driver for the mediated vfio ccw device. - This is provided by the mdev framework. It is a vfio device driver for - the mdev that created by vfio_ccw. - It realizes a group of vfio device driver callbacks, adds itself to a - vfio group, and registers itself to the mdev framework as a mdev - driver. - It uses a vfio iommu backend that uses the existing map and unmap - ioctls, but rather than programming them into an IOMMU for a device, - it simply stores the translations for use by later requests. This - means that a device programmed in a VM with guest physical addresses - can have the vfio kernel convert that address to process virtual - address, pin the page and program the hardware with the host physical - address in one step. - For a mdev, the vfio iommu backend will not pin the pages during the - VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database - of the iova<->vaddr mappings in this operation. And they export a - vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu - backend for the physical devices to pin and unpin pages by demand. - -Below is a high Level block diagram:: - - +-------------+ - | | - | +---------+ | mdev_register_driver() +--------------+ - | | Mdev | +<-----------------------+ | - | | bus | | | vfio_mdev.ko | - | | driver | +----------------------->+ |<-> VFIO user - | +---------+ | probe()/remove() +--------------+ APIs - | | - | MDEV CORE | - | MODULE | - | mdev.ko | - | +---------+ | mdev_register_parent() +--------------+ - | |Physical | +<-----------------------+ | - | | device | | | vfio_ccw.ko |<-> subchannel - | |interface| +----------------------->+ | device - | +---------+ | callback +--------------+ - +-------------+ - -The process of how these work together. - -1. vfio_ccw.ko drives the physical I/O subchannel, and registers the - physical device (with callbacks) to mdev framework. - When vfio_ccw probing the subchannel device, it registers device - pointer and callbacks to the mdev framework. Mdev related file nodes - under the device node in sysfs would be created for the subchannel - device, namely 'mdev_create', 'mdev_destroy' and - 'mdev_supported_types'. -2. Create a mediated vfio ccw device. - Use the 'mdev_create' sysfs file, we need to manually create one (and - only one for our case) mediated device. -3. vfio_mdev.ko drives the mediated ccw device. - vfio_mdev is also the vfio device driver. It will probe the mdev and - add it to an iommu_group and a vfio_group. Then we could pass through - the mdev to a guest. - - -VFIO-CCW Regions ----------------- - -The vfio-ccw driver exposes MMIO regions to accept requests from and return -results to userspace. - -vfio-ccw I/O region -------------------- - -An I/O region is used to accept channel program request from user -space and store I/O interrupt result for user space to retrieve. The -definition of the region is:: - - struct ccw_io_region { - #define ORB_AREA_SIZE 12 - __u8 orb_area[ORB_AREA_SIZE]; - #define SCSW_AREA_SIZE 12 - __u8 scsw_area[SCSW_AREA_SIZE]; - #define IRB_AREA_SIZE 96 - __u8 irb_area[IRB_AREA_SIZE]; - __u32 ret_code; - } __packed; - -This region is always available. - -While starting an I/O request, orb_area should be filled with the -guest ORB, and scsw_area should be filled with the SCSW of the Virtual -Subchannel. - -irb_area stores the I/O result. - -ret_code stores a return code for each access of the region. The following -values may occur: - -``0`` - The operation was successful. - -``-EOPNOTSUPP`` - The ORB specified transport mode or the - SCSW specified a function other than the start function. - -``-EIO`` - A request was issued while the device was not in a state ready to accept - requests, or an internal error occurred. - -``-EBUSY`` - The subchannel was status pending or busy, or a request is already active. - -``-EAGAIN`` - A request was being processed, and the caller should retry. - -``-EACCES`` - The channel path(s) used for the I/O were found to be not operational. - -``-ENODEV`` - The device was found to be not operational. - -``-EINVAL`` - The orb specified a chain longer than 255 ccws, or an internal error - occurred. - - -vfio-ccw cmd region -------------------- - -The vfio-ccw cmd region is used to accept asynchronous instructions -from userspace:: - - #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0) - #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1) - struct ccw_cmd_region { - __u32 command; - __u32 ret_code; - } __packed; - -This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD. - -Currently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region. - -command specifies the command to be issued; ret_code stores a return code -for each access of the region. The following values may occur: - -``0`` - The operation was successful. - -``-ENODEV`` - The device was found to be not operational. - -``-EINVAL`` - A command other than halt or clear was specified. - -``-EIO`` - A request was issued while the device was not in a state ready to accept - requests. - -``-EAGAIN`` - A request was being processed, and the caller should retry. - -``-EBUSY`` - The subchannel was status pending or busy while processing a halt request. - -vfio-ccw schib region ---------------------- - -The vfio-ccw schib region is used to return Subchannel-Information -Block (SCHIB) data to userspace:: - - struct ccw_schib_region { - #define SCHIB_AREA_SIZE 52 - __u8 schib_area[SCHIB_AREA_SIZE]; - } __packed; - -This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_SCHIB. - -Reading this region triggers a STORE SUBCHANNEL to be issued to the -associated hardware. - -vfio-ccw crw region ---------------------- - -The vfio-ccw crw region is used to return Channel Report Word (CRW) -data to userspace:: - - struct ccw_crw_region { - __u32 crw; - __u32 pad; - } __packed; - -This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_CRW. - -Reading this region returns a CRW if one that is relevant for this -subchannel (e.g. one reporting changes in channel path state) is -pending, or all zeroes if not. If multiple CRWs are pending (including -possibly chained CRWs), reading this region again will return the next -one, until no more CRWs are pending and zeroes are returned. This is -similar to how STORE CHANNEL REPORT WORD works. - -vfio-ccw operation details --------------------------- - -vfio-ccw follows what vfio-pci did on the s390 platform and uses -vfio-iommu-type1 as the vfio iommu backend. - -* CCW translation APIs - A group of APIs (start with `cp_`) to do CCW translation. The CCWs - passed in by a user space program are organized with their guest - physical memory addresses. These APIs will copy the CCWs into kernel - space, and assemble a runnable kernel channel program by updating the - guest physical addresses with their corresponding host physical addresses. - Note that we have to use IDALs even for direct-access CCWs, as the - referenced memory can be located anywhere, including above 2G. - -* vfio_ccw device driver - This driver utilizes the CCW translation APIs and introduces - vfio_ccw, which is the driver for the I/O subchannel devices you want - to pass through. - vfio_ccw implements the following vfio ioctls:: - - VFIO_DEVICE_GET_INFO - VFIO_DEVICE_GET_IRQ_INFO - VFIO_DEVICE_GET_REGION_INFO - VFIO_DEVICE_RESET - VFIO_DEVICE_SET_IRQS - - This provides an I/O region, so that the user space program can pass a - channel program to the kernel, to do further CCW translation before - issuing them to a real device. - This also provides the SET_IRQ ioctl to setup an event notifier to - notify the user space program the I/O completion in an asynchronous - way. - -The use of vfio-ccw is not limited to QEMU, while QEMU is definitely a -good example to get understand how these patches work. Here is a little -bit more detail how an I/O request triggered by the QEMU guest will be -handled (without error handling). - -Explanation: - -- Q1-Q7: QEMU side process. -- K1-K5: Kernel side process. - -Q1. - Get I/O region info during initialization. - -Q2. - Setup event notifier and handler to handle I/O completion. - -... ... - -Q3. - Intercept a ssch instruction. -Q4. - Write the guest channel program and ORB to the I/O region. - - K1. - Copy from guest to kernel. - K2. - Translate the guest channel program to a host kernel space - channel program, which becomes runnable for a real device. - K3. - With the necessary information contained in the orb passed in - by QEMU, issue the ccwchain to the device. - K4. - Return the ssch CC code. -Q5. - Return the CC code to the guest. - -... ... - - K5. - Interrupt handler gets the I/O result and write the result to - the I/O region. - K6. - Signal QEMU to retrieve the result. - -Q6. - Get the signal and event handler reads out the result from the I/O - region. -Q7. - Update the irb for the guest. - -Limitations ------------ - -The current vfio-ccw implementation focuses on supporting basic commands -needed to implement block device functionality (read/write) of DASD/ECKD -device only. Some commands may need special handling in the future, for -example, anything related to path grouping. - -DASD is a kind of storage device. While ECKD is a data recording format. -More information for DASD and ECKD could be found here: -https://en.wikipedia.org/wiki/Direct-access_storage_device -https://en.wikipedia.org/wiki/Count_key_data - -Together with the corresponding work in QEMU, we can bring the passed -through DASD/ECKD device online in a guest now and use it as a block -device. - -The current code allows the guest to start channel programs via -START SUBCHANNEL, and to issue HALT SUBCHANNEL, CLEAR SUBCHANNEL, -and STORE SUBCHANNEL. - -Currently all channel programs are prefetched, regardless of the -p-bit setting in the ORB. As a result, self modifying channel -programs are not supported. For this reason, IPL has to be handled as -a special case by a userspace/guest program; this has been implemented -in QEMU's s390-ccw bios as of QEMU 4.1. - -vfio-ccw supports classic (command mode) channel I/O only. Transport -mode (HPF) is not supported. - -QDIO subchannels are currently not supported. Classic devices other than -DASD/ECKD might work, but have not been tested. - -Reference ---------- -1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832) -2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204) -3. https://en.wikipedia.org/wiki/Channel_I/O -4. Documentation/s390/cds.rst -5. Documentation/driver-api/vfio.rst -6. Documentation/driver-api/vfio-mediated-device.rst |