summaryrefslogtreecommitdiffstats
path: root/drivers/vfio/group.c
Commit message (Collapse)AuthorAgeFilesLines
* vfio-iommufd: Split bind/attach into two stepsYi Liu2023-07-251-4/+13
| | | | | | | | | | | | | | | | This aligns the bind/attach logic with the coming vfio device cdev support. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-12-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()Yi Liu2023-07-251-0/+13
| | | | | | | | | | | | | | | | | This moves the noiommu compat validation logic into vfio_df_group_open(). This is more consistent with what will be done in vfio device cdev path. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-11-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio: Make vfio_df_open() single open for device cdev pathYi Liu2023-07-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFIO group has historically allowed multi-open of the device FD. This was made secure because the "open" was executed via an ioctl to the group FD which is itself only single open. However, no known use of multiple device FDs today. It is kind of a strange thing to do because new device FDs can naturally be created via dup(). When we implement the new device uAPI (only used in cdev path) there is no natural way to allow the device itself from being multi-opened in a secure manner. Without the group FD we cannot prove the security context of the opener. Thus, when moving to the new uAPI we block the ability of opening a device multiple times. Given old group path still allows it we store a vfio_group pointer in struct vfio_device_file to differentiate. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-10-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio: Add cdev_device_open_cnt to vfio_groupYi Liu2023-07-251-0/+33
| | | | | | | | | | | | | | | | | | | | | This is for counting the devices that are opened via the cdev path. This count is increased and decreased by the cdev path. The group path checks it to achieve exclusion with the cdev path. With this, only one path (group path or cdev path) will claim DMA ownership. This avoids scenarios in which devices within the same group may be opened via different paths. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-9-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio: Block device access via device fd until device is openedYi Liu2023-07-251-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the vfio_device file to be in a state where the device FD is opened but the device cannot be used by userspace (i.e. its .open_device() hasn't been called). This inbetween state is not used when the device FD is spawned from the group FD, however when we create the device FD directly by opening a cdev it will be opened in the blocked state. The reason for the inbetween state is that userspace only gets a FD but doesn't gain access permission until binding the FD to an iommufd. So in the blocked state, only the bind operation is allowed. Completing bind will allow user to further access the device. This is implemented by adding a flag in struct vfio_device_file to mark the blocked state and using a simple smp_load_acquire() to obtain the flag value and serialize all the device setup with the thread accessing this device. Following this lockless scheme, it can safely handle the device FD unbound->bound but it cannot handle bound->unbound. To allow this we'd need to add a lock on all the vfio ioctls which seems costly. So once device FD is bound, it remains bound until the FD is closed. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-8-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio: Pass struct vfio_device_file * to vfio_device_open/close()Yi Liu2023-07-251-6/+14
| | | | | | | | | | | | | | | | | | This avoids passing too much parameters in multiple functions. Per the input parameter change, rename the function to be vfio_df_open/close(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-7-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio: Refine vfio file kAPIs for KVMYi Liu2023-07-251-36/+17
| | | | | | | | | | | | | | | | | | | | | This prepares for making the below kAPIs to accept both group file and device file instead of only vfio group file. bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm); Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-3-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio: Allocate per device file structureYi Liu2023-07-251-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is preparation for adding vfio device cdev support. vfio device cdev requires: 1) A per device file memory to store the kvm pointer set by KVM. It will be propagated to vfio_device:kvm after the device cdev file is bound to an iommufd. 2) A mechanism to block device access through device cdev fd before it is bound to an iommufd. To address the above requirements, this adds a per device file structure named vfio_device_file. For now, it's only a wrapper of struct vfio_device pointer. Other fields will be added to this per file structure in future commits. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-2-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* driver core: class: remove module * from class_create()Greg Kroah-Hartman2023-03-171-1/+1
| | | | | | | | | | | | The module pointer in class_create() never actually did anything, and it shouldn't have been requred to be set as a parameter even if it did something. So just remove it and fix up all callers of the function in the kernel tree at the same time. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds2023-02-251-8/+38
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull VFIO updates from Alex Williamson: - Remove redundant resource check in vfio-platform (Angus Chen) - Use GFP_KERNEL_ACCOUNT for persistent userspace allocations, allowing removal of arbitrary kernel limits in favor of cgroup control (Yishai Hadas) - mdev tidy-ups, including removing the module-only build restriction for sample drivers, Kconfig changes to select mdev support, documentation movement to keep sample driver usage instructions with sample drivers rather than with API docs, remove references to out-of-tree drivers in docs (Christoph Hellwig) - Fix collateral breakages from mdev Kconfig changes (Arnd Bergmann) - Make mlx5 migration support match device support, improve source and target flows to improve pre-copy support and reduce downtime (Yishai Hadas) - Convert additional mdev sysfs case to use sysfs_emit() (Bo Liu) - Resolve copy-paste error in mdev mbochs sample driver Kconfig (Ye Xingchen) - Avoid propagating missing reset error in vfio-platform if reset requirement is relaxed by module option (Tomasz Duszynski) - Range size fixes in mlx5 variant driver for missed last byte and stricter range calculation (Yishai Hadas) - Fixes to suspended vaddr support and locked_vm accounting, excluding mdev configurations from the former due to potential to indefinitely block kernel threads, fix underflow and restore locked_vm on new mm (Steve Sistare) - Update outdated vfio documentation due to new IOMMUFD interfaces in recent kernels (Yi Liu) - Resolve deadlock between group_lock and kvm_lock, finally (Matthew Rosato) - Fix NULL pointer in group initialization error path with IOMMUFD (Yan Zhao) * tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfio: (32 commits) vfio: Fix NULL pointer dereference caused by uninitialized group->iommufd docs: vfio: Update vfio.rst per latest interfaces vfio: Update the kdoc for vfio_device_ops vfio/mlx5: Fix range size calculation upon tracker creation vfio: no need to pass kvm pointer during device open vfio: fix deadlock between group lock and kvm lock vfio: revert "iommu driver notify callback" vfio/type1: revert "implement notify callback" vfio/type1: revert "block on invalid vaddr" vfio/type1: restore locked_vm vfio/type1: track locked_vm per dma vfio/type1: prevent underflow of locked_vm via exec() vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR vfio: platform: ignore missing reset if disabled at module init vfio/mlx5: Improve the target side flow to reduce downtime vfio/mlx5: Improve the source side flow upon pre_copy vfio/mlx5: Check whether VF is migratable samples: fix the prompt about SAMPLE_VFIO_MDEV_MBOCHS vfio/mdev: Use sysfs_emit() to instead of sprintf() vfio-mdev: add back CONFIG_VFIO dependency ...
| * vfio: Fix NULL pointer dereference caused by uninitialized group->iommufdYan Zhao2023-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | group->iommufd is not initialized for the iommufd_ctx_put() [20018.331541] BUG: kernel NULL pointer dereference, address: 0000000000000000 [20018.377508] RIP: 0010:iommufd_ctx_put+0x5/0x10 [iommufd] ... [20018.476483] Call Trace: [20018.479214] <TASK> [20018.481555] vfio_group_fops_unl_ioctl+0x506/0x690 [vfio] [20018.487586] __x64_sys_ioctl+0x6a/0xb0 [20018.491773] ? trace_hardirqs_on+0xc5/0xe0 [20018.496347] do_syscall_64+0x67/0x90 [20018.500340] entry_SYSCALL_64_after_hwframe+0x4b/0xb5 Fixes: 9eefba8002c2 ("vfio: Move vfio group specific code into group.c") Cc: stable@vger.kernel.org Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230222074938.13681-1-yan.y.zhao@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: no need to pass kvm pointer during device openMatthew Rosato2023-02-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Nothing uses this value during vfio_device_open anymore so it's safe to remove it. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230203215027.151988-3-mjrosato@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: fix deadlock between group lock and kvm lockMatthew Rosato2023-02-091-7/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After 51cdc8bc120e, we have another deadlock scenario between the kvm->lock and the vfio group_lock with two different codepaths acquiring the locks in different order. Specifically in vfio_open_device, vfio holds the vfio group_lock when issuing device->ops->open_device but some drivers (like vfio-ap) need to acquire kvm->lock during their open_device routine; Meanwhile, kvm_vfio_release will acquire the kvm->lock first before calling vfio_file_set_kvm which will acquire the vfio group_lock. To resolve this, let's remove the need for the vfio group_lock from the kvm_vfio_release codepath. This is done by introducing a new spinlock to protect modifications to the vfio group kvm pointer, and acquiring a kvm ref from within vfio while holding this spinlock, with the reference held until the last close for the device in question. Fixes: 51cdc8bc120e ("kvm/vfio: Fix potential deadlock on vfio group_lock") Reported-by: Anthony Krowiak <akrowiak@linux.ibm.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230203215027.151988-2-mjrosato@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | vfio: Support VFIO_NOIOMMU with iommufdJason Gunthorpe2023-02-031-2/+5
|/ | | | | | | | | | | | | | | | | | Add a small amount of emulation to vfio_compat to accept the SET_IOMMU to VFIO_NOIOMMU_IOMMU and have vfio just ignore iommufd if it is working on a no-iommu enabled device. Move the enable_unsafe_noiommu_mode module out of container.c into vfio_main.c so that it is always available even if VFIO_CONTAINER=n. This passes Alex's mini-test: https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c Link: https://lore.kernel.org/r/0-v3-480cd64a16f7+1ad0-iommufd_noiommu_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* Merge tag 'driver-core-6.2-rc1' of ↵Linus Torvalds2022-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.2-rc1. The "big" change in here is the addition of a new macro, container_of_const() that will preserve the "const-ness" of a pointer passed into it. The "problem" of the current container_of() macro is that if you pass in a "const *", out of it can comes a non-const pointer unless you specifically ask for it. For many usages, we want to preserve the "const" attribute by using the same call. For a specific example, this series changes the kobj_to_dev() macro to use it, allowing it to be used no matter what the const value is. This prevents every subsystem from having to declare 2 different individual macros (i.e. kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce the const value at build time, which having 2 macros would not do either. The driver for all of this have been discussions with the Rust kernel developers as to how to properly mark driver core, and kobject, objects as being "non-mutable". The changes to the kobject and driver core in this pull request are the result of that, as there are lots of paths where kobjects and device pointers are not modified at all, so marking them as "const" allows the compiler to enforce this. So, a nice side affect of the Rust development effort has been already to clean up the driver core code to be more obvious about object rules. All of this has been bike-shedded in quite a lot of detail on lkml with different names and implementations resulting in the tiny version we have in here, much better than my original proposal. Lots of subsystem maintainers have acked the changes as well. Other than this change, included in here are smaller stuff like: - kernfs fixes and updates to handle lock contention better - vmlinux.lds.h fixes and updates - sysfs and debugfs documentation updates - device property updates All of these have been in the linux-next tree for quite a while with no problems" * tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits) device property: Fix documentation for fwnode_get_next_parent() firmware_loader: fix up to_fw_sysfs() to preserve const usb.h: take advantage of container_of_const() device.h: move kobj_to_dev() to use container_of_const() container_of: add container_of_const() that preserves const-ness of the pointer driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion. driver core: fix up missed scsi/cxlflash class.devnode() conversion. driver core: fix up some missing class.devnode() conversions. driver core: make struct class.devnode() take a const * driver core: make struct class.dev_uevent() take a const * cacheinfo: Remove of_node_put() for fw_token device property: Add a blank line in Kconfig of tests device property: Rename goto label to be more precise device property: Move PROPERTY_ENTRY_BOOL() a bit down device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*() kernfs: fix all kernel-doc warnings and multiple typos driver core: pass a const * into of_device_uevent() kobject: kset_uevent_ops: make name() callback take a const * kobject: kset_uevent_ops: make filter() callback take a const * kobject: make kobject_namespace take a const * ...
* vfio: Move vfio group specific code into group.cYi Liu2022-12-051-0/+877
This prepares for compiling out vfio group after vfio device cdev is added. No vfio_group decode code should be in vfio_main.c, and neither device->group reference should be in vfio_main.c. No functional change is intended. Link: https://lore.kernel.org/r/20221201145535.589687-11-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Yu He <yu.he@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>