summaryrefslogtreecommitdiffstats
path: root/drivers/vhost
Commit message (Collapse)AuthorAgeFilesLines
* vhost_vdpa: fix the crash in unmap a large memoryCindy Lu2022-12-281-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing in vIOMMU, sometimes Guest will unmap very large memory, which will cause the crash. To fix this, add a new function vhost_vdpa_general_unmap(). This function will only unmap the memory that saved in iotlb. Call Trace: [ 647.820144] ------------[ cut here ]------------ [ 647.820848] kernel BUG at drivers/iommu/intel/iommu.c:1174! [ 647.821486] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 647.822082] CPU: 10 PID: 1181 Comm: qemu-system-x86 Not tainted 6.0.0-rc1home_lulu_2452_lulu7_vhost+ #62 [ 647.823139] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qem4 [ 647.824365] RIP: 0010:domain_unmap+0x48/0x110 [ 647.825424] Code: 48 89 fb 8d 4c f6 1e 39 c1 0f 4f c8 83 e9 0c 83 f9 3f 7f 18 48 89 e8 48 d3 e8 48 85 c0 75 59 [ 647.828064] RSP: 0018:ffffae5340c0bbf0 EFLAGS: 00010202 [ 647.828973] RAX: 0000000000000001 RBX: ffff921793d10540 RCX: 000000000000001b [ 647.830083] RDX: 00000000080000ff RSI: 0000000000000001 RDI: ffff921793d10540 [ 647.831214] RBP: 0000000007fc0100 R08: ffffae5340c0bcd0 R09: 0000000000000003 [ 647.832388] R10: 0000007fc0100000 R11: 0000000000100000 R12: 00000000080000ff [ 647.833668] R13: ffffae5340c0bcd0 R14: ffff921793d10590 R15: 0000008000100000 [ 647.834782] FS: 00007f772ec90640(0000) GS:ffff921ce7a80000(0000) knlGS:0000000000000000 [ 647.836004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 647.836990] CR2: 00007f02c27a3a20 CR3: 0000000101b0c006 CR4: 0000000000372ee0 [ 647.838107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 647.839283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 647.840666] Call Trace: [ 647.841437] <TASK> [ 647.842107] intel_iommu_unmap_pages+0x93/0x140 [ 647.843112] __iommu_unmap+0x91/0x1b0 [ 647.844003] iommu_unmap+0x6a/0x95 [ 647.844885] vhost_vdpa_unmap+0x1de/0x1f0 [vhost_vdpa] [ 647.845985] vhost_vdpa_process_iotlb_msg+0xf0/0x90b [vhost_vdpa] [ 647.847235] ? _raw_spin_unlock+0x15/0x30 [ 647.848181] ? _copy_from_iter+0x8c/0x580 [ 647.849137] vhost_chr_write_iter+0xb3/0x430 [vhost] [ 647.850126] vfs_write+0x1e4/0x3a0 [ 647.850897] ksys_write+0x53/0xd0 [ 647.851688] do_syscall_64+0x3a/0x90 [ 647.852508] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 647.853457] RIP: 0033:0x7f7734ef9f4f [ 647.854408] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 76 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c8 [ 647.857217] RSP: 002b:00007f772ec8f040 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 647.858486] RAX: ffffffffffffffda RBX: 00000000fef00000 RCX: 00007f7734ef9f4f [ 647.859713] RDX: 0000000000000048 RSI: 00007f772ec8f090 RDI: 0000000000000010 [ 647.860942] RBP: 00007f772ec8f1a0 R08: 0000000000000000 R09: 0000000000000000 [ 647.862206] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000010 [ 647.863446] R13: 0000000000000002 R14: 0000000000000000 R15: ffffffff01100000 [ 647.864692] </TASK> [ 647.865458] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs v] [ 647.874688] ---[ end trace 0000000000000000 ]--- Cc: stable@vger.kernel.org Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20221219073331.556140-1-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: fix an iotlb memory leakStefano Garzarella2022-12-281-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before commit 3d5698793897 ("vhost-vdpa: introduce asid based IOTLB") we called vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1) during release to free all the resources allocated when processing user IOTLB messages through vhost_vdpa_process_iotlb_update(). That commit changed the handling of IOTLB a bit, and we accidentally removed some code called during the release. We partially fixed this with commit 037d4305569a ("vhost-vdpa: call vhost_vdpa_cleanup during the release") but a potential memory leak is still there as showed by kmemleak if the application does not send VHOST_IOTLB_INVALIDATE or crashes: unreferenced object 0xffff888007fbaa30 (size 16): comm "blkio-bench", pid 914, jiffies 4294993521 (age 885.500s) hex dump (first 16 bytes): 40 73 41 07 80 88 ff ff 00 00 00 00 00 00 00 00 @sA............. backtrace: [<0000000087736d2a>] kmem_cache_alloc_trace+0x142/0x1c0 [<0000000060740f50>] vhost_vdpa_process_iotlb_msg+0x68c/0x901 [vhost_vdpa] [<0000000083e8e205>] vhost_chr_write_iter+0xc0/0x4a0 [vhost] [<000000008f2f414a>] vhost_vdpa_chr_write_iter+0x18/0x20 [vhost_vdpa] [<00000000de1cd4a0>] vfs_write+0x216/0x4b0 [<00000000a2850200>] ksys_write+0x71/0xf0 [<00000000de8e720b>] __x64_sys_write+0x19/0x20 [<0000000018b12cbb>] do_syscall_64+0x3f/0x90 [<00000000986ec465>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Let's fix this calling vhost_vdpa_iotlb_unmap() on the whole range in vhost_vdpa_remove_as(). We move that call before vhost_dev_cleanup() since we need a valid v->vdev.mm in vhost_vdpa_pa_unmap(). vhost_iotlb_reset() call can be removed, since vhost_vdpa_iotlb_unmap() on the whole range removes all the entries. The kmemleak log reported was observed with a vDPA device that has `use_va` set to true (e.g. VDUSE). This patch has been tested with both types of devices. Fixes: 037d4305569a ("vhost-vdpa: call vhost_vdpa_cleanup during the release") Fixes: 3d5698793897 ("vhost-vdpa: introduce asid based IOTLB") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109154213.146789-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* vhost: fix range used in translate_desc()Stefano Garzarella2022-12-281-2/+2
| | | | | | | | | | | | | | | | | | | vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range. In translate_desc() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant. Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop. Fixes: a9709d6874d5 ("vhost: convert pre sorted vhost memory array to interval tree") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109102503.18816-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vringh: fix range used in iotlb_translate()Stefano Garzarella2022-12-281-3/+2
| | | | | | | | | | | | | | | | | | | vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range. In iotlb_translate() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant. Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop. Fixes: 9ad9c49cfe97 ("vringh: IOTLB support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109102503.18816-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost/vsock: Fix error handling in vhost_vsock_init()Yuan Can2022-12-281-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A problem about modprobe vhost_vsock failed is triggered with the following log given: modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy The reason is that vhost_vsock_init() returns misc_register() directly without checking its return value, if misc_register() failed, it returns without calling vsock_core_unregister() on vhost_transport, resulting the vhost_vsock can never be installed later. A simple call graph is shown as below: vhost_vsock_init() vsock_core_register() # register vhost_transport misc_register() device_create_with_groups() device_create_groups_vargs() dev = kzalloc(...) # OOM happened # return without unregister vhost_transport Fix by calling vsock_core_unregister() when misc_register() returns error. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Yuan Can <yuancan@huawei.com> Message-Id: <20221108101705.45981-1-yuancan@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* use less confusing names for iov_iter direction initializersAl Viro2022-11-255-15/+15
| | | | | | | | | | | | | READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [vhost] fix 'direction' argument of iov_iter_{init,bvec}()Al Viro2022-11-252-5/+5
| | | | | | READ means "data destination", WRITE - "data source". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2022-10-101-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - 9k mtu perf improvements - vdpa feature provisioning - virtio blk SECURE ERASE support - fixes and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci: don't try to use intxif pin is zero vDPA: conditionally read MTU and MAC in dev cfg space vDPA: fix spars cast warning in vdpa_dev_net_mq_config_fill vDPA: check virtio device features to detect MQ vDPA: check VIRTIO_NET_F_RSS for max_virtqueue_paris's presence vDPA: only report driver features if FEATURES_OK is set vDPA: allow userspace to query features of a vDPA device virtio_blk: add SECURE ERASE command support vp_vdpa: support feature provisioning vdpa_sim_net: support feature provisioning vdpa: device feature provisioning virtio-net: use mtu size as buffer length for big packets virtio-net: introduce and use helper function for guest gso support checks virtio: drop vp_legacy_set_queue_size virtio_ring: make vring_alloc_queue_packed prettier virtio_ring: split: Operators use unified style vhost: add __init/__exit annotations to module init/exit funcs
| * vhost: add __init/__exit annotations to module init/exit funcsXiu Jianfeng2022-10-071-2/+2
| | | | | | | | | | | | | | | | Add missing __init/__exit annotations to module init/exit funcs. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Message-Id: <20220917083803.21521-1-xiujianfeng@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-10-031-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge in the left-over fixes before the net-next pull-request. Conflicts: drivers/net/ethernet/mediatek/mtk_ppe.c ae3ed15da588 ("net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clear") 9d8cb4c096ab ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc") https://lore.kernel.org/all/6cb6893b-4921-a068-4c30-1109795110bb@tessares.net/ kernel/bpf/helpers.c 8addbfc7b308 ("bpf: Gate dynptr API behind CAP_BPF") 5679ff2f138f ("bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPF") 8a67f2de9b1d ("bpf: expose bpf_strtol and bpf_strtoul to all program types") https://lore.kernel.org/all/20221003201957.13149-1-daniel@iogearbox.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | vhost/vsock: Use kvmalloc/kvfree for larger packets.Junichi Uekawa2022-09-291-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When copying a large file over sftp over vsock, data size is usually 32kB, and kmalloc seems to fail to try to allocate 32 32kB regions. vhost-5837: page allocation failure: order:4, mode:0x24040c0 Call Trace: [<ffffffffb6a0df64>] dump_stack+0x97/0xdb [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138 [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8 [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19 [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7 [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock] [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost] [<ffffffffb683ddce>] kthread+0xfd/0x105 [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost] [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80 [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 Work around by doing kvmalloc instead. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Junichi Uekawa <uekawa@chromium.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220928064538.667678-1-uekawa@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* / vhost/net: use struct ubuf_info_msgzcPavel Begunkov2022-09-281-7/+8
|/ | | | | | | | struct ubuf_info will be changed, use ubuf_info_msgzc instead. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2022-08-123-48/+153
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - A huge patchset supporting vq resize using the new vq reset capability - Features, fixes, and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (88 commits) vdpa/mlx5: Fix possible uninitialized return value vdpa_sim_blk: add support for discard and write-zeroes vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests vdpa_sim_blk: check if sector is 0 for commands other than read or write vdpa_sim: Implement suspend vdpa op vhost-vdpa: uAPI to suspend the device vhost-vdpa: introduce SUSPEND backend feature bit vdpa: Add suspend operation virtio-blk: Avoid use-after-free on suspend/resume virtio_vdpa: support the arg sizes of find_vqs() vhost-vdpa: Call ida_simple_remove() when failed vDPA: fix 'cast to restricted le16' warnings in vdpa.c vDPA: !FEATURES_OK should not block querying device config space vDPA/ifcvf: support userspace to query features and MQ of a management device vDPA/ifcvf: get_config_size should return a value no greater than dev implementation vhost scsi: Allow user to control num virtqueues vhost-scsi: Fix max number of virtqueues vdpa/mlx5: Support different address spaces for control and data vdpa/mlx5: Implement susupend virtqueue callback ...
| * vhost-vdpa: uAPI to suspend the deviceEugenio Pérez2022-08-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ioctl adds support for suspending the device from userspace. This is a must before getting virtqueue indexes (base) for live migration, since the device could modify them after userland gets them. There are individual ways to perform that action for some devices (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no way to perform it for any vhost device (and, in particular, vhost-vdpa). After a successful return of the ioctl call the device must not process more virtqueue descriptors. The device can answer to read or writes of config fields as if it were not suspended. In particular, writing to "queue_enable" with a value of 1 will not make the device start processing buffers of the virtqueue. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-4-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost-vdpa: introduce SUSPEND backend feature bitEugenio Pérez2022-08-111-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Userland knows if it can suspend the device or not by checking this feature bit. It's only offered if the vdpa driver backend implements the suspend() operation callback, and to offer it or userland to ack it if the backend does not offer that callback is an error. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-3-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost-vdpa: Call ida_simple_remove() when failedBo Liu2022-08-111-0/+1
| | | | | | | | | | | | | | | | | | In function vhost_vdpa_probe(), when code execution fails, we should call ida_simple_remove() to free ida. Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220805091254.20026-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost scsi: Allow user to control num virtqueuesMike Christie2022-08-111-24/+61
| | | | | | | | | | | | | | | | | | | | | | | | We are currently hard coded to always create 128 IO virtqueues, so this adds a modparam to control it. For large systems where we are ok with using memory for virtqueues it allows us to add up to 1024. This limit was just selected becuase that's qemu's limit. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20220708030525.5065-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * vhost-scsi: Fix max number of virtqueuesMike Christie2022-08-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qemu takes it's num_queues limit then adds the fixed queues (control and event) to the total it will request from the kernel. So when a user requests 128 (or qemu does it's num_queues calculation based on vCPUS and other system limits), we hit errors due to userspace trying to setup 130 queues when vhost-scsi has a hard coded limit of 128. This has vhost-scsi adjust it's max so we can do a total of 130 virtqueues (128 IO and 2 fixed). For the case where the user has 128 vCPUs the guest OS can then nicely map each IO virtqueue to a vCPU and not have the odd case where 2 vCPUs share a virtqueue. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20220708030525.5065-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * vdpa: Use device_iommu_capable()Robin Murphy2022-08-111-1/+1
| | | | | | | | | | | | | | | | | | | | Use the new interface to check the capability for our device specifically. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Message-Id: <548e316fa282ce513fabb991a4c4d92258062eb5.1654688822.git.robin.murphy@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * vringh: iterate on iotlb_translate to handle large translationsStefano Garzarella2022-08-111-22/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iotlb_translate() can return -ENOBUFS if the bio_vec is not big enough to contain all the ranges for translation. This can happen for example if the VMM maps a large bounce buffer, without using hugepages, that requires more than 16 ranges to translate the addresses. To handle this case, let's extend iotlb_translate() to also return the number of bytes successfully translated. In copy_from_iotlb()/copy_to_iotlb() loops by calling iotlb_translate() several times until we complete the translation. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220624075656.13997-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()Al Viro2022-08-081-3/+1
|/ | | | | | | | | | | | | | Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned. Provide inline wrappers doing that, convert trivial open-coded uses of those. BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vhost-vdpa: call vhost_vdpa_cleanup during the releaseStefano Garzarella2022-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before commit 3d5698793897 ("vhost-vdpa: introduce asid based IOTLB") we call vhost_vdpa_iotlb_free() during the release to clean all regions mapped in the iotlb. That commit removed vhost_vdpa_iotlb_free() and added vhost_vdpa_cleanup() to do some cleanup, including deleting all mappings, but we forgot to call it in vhost_vdpa_release(). This causes that if an application does not remove all mappings explicitly (or it crashes), the mappings remain in the iotlb and subsequent applications may fail if they map the same addresses. Calling vhost_vdpa_cleanup() also fixes a memory leak since we are not freeing `v->vdev.vqs` during the release from the same commit. Since vhost_vdpa_cleanup() calls vhost_dev_cleanup() we can remove its call from vhost_vdpa_release(). Fixes: 3d5698793897 ("vhost-vdpa: introduce asid based IOTLB") Cc: gautam.dawar@xilinx.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220622151407.51232-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* vdpa: make get_vq_group and set_group_asid optionalJason Wang2022-06-091-0/+2
| | | | | | | | | | | | This patch makes get_vq_group and set_group_asid optional. This is needed to unbreak the vDPA parent that doesn't support multiple address spaces. Cc: Gautam Dawar <gautam.dawar@xilinx.com> Fixes: aaca8373c4b1 ("vhost-vdpa: support ASID based IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220609041901.2029-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vringh: Fix loop descriptors check in the indirect casesXie Yongji2022-06-081-2/+8
| | | | | | | | | | | | | | We should use size of descriptor chain to test loop condition in the indirect case. And another statistical count is also introduced for indirect descriptors to avoid conflict with the statistical count of direct descriptors. Fixes: f87d0fbb5798 ("vringh: host-side implementation of virtio rings.") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Fam Zheng <fam.zheng@bytedance.com> Message-Id: <20220505100910.137-1-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* vhost: rename vhost_work_dev_flushMike Christie2022-05-316-11/+11
| | | | | | | | | | | | | | This patch renames vhost_work_dev_flush to just vhost_dev_flush to relfect that it flushes everything on the device and that drivers don't know/care that polls are based on vhost_works. Drivers just flush the entire device and polls, and works for vhost-scsi management TMFs and IO net virtqueues, etc all are flushed. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-9-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-test: drop flush after vhost_dev_cleanupMike Christie2022-05-311-3/+0
| | | | | | | | | | | | | | | The flush after vhost_dev_cleanup is not needed because: 1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread so the flush call will just return since the worker has not device. 2. It's not needed. The comment about jobs re-queueing themselves does not look correct because handle_vq does not requeue work. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220517180850.198915-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-scsi: drop flush after vhost_dev_cleanupMike Christie2022-05-311-2/+0
| | | | | | | | | | | | | | | | | | | | The flush after vhost_dev_cleanup is not needed because: 1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread so the flush call will just return since the worker has not device. 2. It's not needed for the re-queue case. vhost_scsi_evt_handle_kick grabs the mutex and if the backend is NULL will return without queueing a work. vhost_scsi_clear_endpoint will set the backend to NULL under the vq->mutex then drops the mutex and does a flush. So we know when vhost_scsi_clear_endpoint has dropped the mutex after clearing the backend no evt related work will be able to requeue. The flush would then make sure any queued evts are run and return. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220517180850.198915-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost_vsock: simplify vhost_vsock_flush()Andrey Ryabinin2022-05-311-5/+0
| | | | | | | | | | | | | | vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev) before vhost_work_dev_flush(&vsock->dev). This seems pointless as vsock->vqs[i].poll.dev is the same as &vsock->dev and several flushes in a row doesn't do anything useful, one is just enough. Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220517180850.198915-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost_test: remove vhost_test_flush_vq()Andrey Ryabinin2022-05-311-8/+3
| | | | | | | | | | | | | | | | vhost_test_flush_vq() just a simple wrapper around vhost_work_dev_flush() which seems have no value. It's just easier to call vhost_work_dev_flush() directly. Besides there is no point in obtaining vhost_dev pointer via 'n->vqs[index].poll.dev' while we can just use &n->dev. It's the same pointers, see vhost_test_open()/vhost_dev_init(). Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220517180850.198915-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
* vhost_net: get rid of vhost_net_flush_vq() and extra flush callsAndrey Ryabinin2022-05-311-9/+2
| | | | | | | | | | | | | | | | | | | vhost_net_flush_vq() calls vhost_work_dev_flush() twice passing vhost_dev pointer obtained via 'n->poll[index].dev' and 'n->vqs[index].vq.poll.dev'. This is actually the same pointer, initialized in vhost_net_open()/vhost_dev_init()/vhost_poll_init() Remove vhost_net_flush_vq() and call vhost_work_dev_flush() directly. Do the flushes only once instead of several flush calls in a row which seems rather useless. Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> [drop vhost_dev forward declaration in vhost.h] Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: flush dev once during vhost_dev_stopMike Christie2022-05-311-3/+3
| | | | | | | | | | | | | When vhost_work_dev_flush returns all work queued at that time will have completed. There is then no need to flush after every vhost_poll_stop call, and we can move the flush call to after the loop that stops the pollers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: get rid of vhost_poll_flush() wrapperAndrey Ryabinin2022-05-315-15/+6
| | | | | | | | | | | | | | | | | | | vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush(). It gives wrong impression that we are doing some work over vhost_poll, while in fact it flushes vhost_poll->dev. It only complicate understanding of the code and leads to mistakes like flushing the same vhost_dev several times in a row. Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly. Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> [merge vhost_poll_flush removal from Stefano Garzarella] Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: return -EFAULT on copy_to_user() failureDan Carpenter2022-05-311-3/+5
| | | | | | | | | | | | | The copy_to_user() function returns the number of bytes remaining to be copied. However, we need to return a negative error code, -EFAULT, to the user. Fixes: 87f4c217413a ("vhost-vdpa: introduce uAPI to get the number of virtqueue groups") Fixes: e96ef636f154 ("vhost-vdpa: introduce uAPI to get the number of address spaces") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Message-Id: <YotG1vXKXXSayr63@kili> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
* vhost-vdpa: support ASID based IOTLB APIGautam Dawar2022-05-312-20/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends the vhost-vdpa to support ASID based IOTLB API. The vhost-vdpa device will allocated multiple IOTLBs for vDPA device that supports multiple address spaces. The IOTLBs and vDPA device memory mappings is determined and maintained through ASID. Note that we still don't support vDPA device with more than one address spaces that depends on platform IOMMU. This work will be done by moving the IOMMU logic from vhost-vDPA to vDPA device driver. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-16-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Includes fixup: vhost-vdpa: Fix some error handling path in vhost_vdpa_process_iotlb_msg() In the error paths introduced by the original patch, a mutex may be left locked. Add the correct goto instead of a direct return. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <89ef0ae4c26ac3cfa440c71e97e392dcb328ac1b.1653227924.git.christophe.jaillet@wanadoo.fr> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: introduce uAPI to set group ASIDGautam Dawar2022-05-311-0/+8
| | | | | | | | | | Follows the vDPA support for associating ASID to a specific virtqueue group. This patch adds a uAPI to support setting them from userspace. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-15-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: uAPI to get virtqueue group idGautam Dawar2022-05-311-0/+8
| | | | | | | | | | | Follows the support for virtqueue group in vDPA. This patches introduces uAPI to get the virtqueue group ID for a specific virtqueue in vhost-vdpa. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-14-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: introduce uAPI to get the number of address spacesGautam Dawar2022-05-311-0/+3
| | | | | | | | | | This patch introduces the uAPI for getting the number of address spaces supported by this vDPA device. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-13-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: introduce uAPI to get the number of virtqueue groupsGautam Dawar2022-05-311-0/+4
| | | | | | | | | | | Follows the vDPA support for multiple address spaces, this patch introduce uAPI for the userspace to know the number of virtqueue groups supported by the vDPA device. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-12-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: introduce asid based IOTLBGautam Dawar2022-05-311-25/+72
| | | | | | | | | | | This patch converts the vhost-vDPA device to support multiple IOTLBs tagged via ASID via hlist. This will be used for supporting multiple address spaces in the following patches. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-11-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: support ASID in IOTLB APIGautam Dawar2022-05-313-8/+24
| | | | | | | | | | | | This patches allows userspace to send ASID based IOTLB message to vhost. This idea is to use the reserved u32 field in the existing V2 IOTLB message. Vhost device should advertise this capability via VHOST_BACKEND_F_IOTLB_ASID backend feature. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-10-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost_iotlb: split out IOTLB initializationGautam Dawar2022-05-311-5/+18
| | | | | | | | | | This patch splits out IOTLB initialization to make sure it could be reused by external modules. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-9-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vdpa: multiple address spaces supportGautam Dawar2022-05-311-5/+9
| | | | | | | | | | | | | | | | | | | | | | | This patches introduces the multiple address spaces support for vDPA device. This idea is to identify a specific address space via an dedicated identifier - ASID. During vDPA device allocation, vDPA device driver needs to report the number of address spaces supported by the device then the DMA mapping ops of the vDPA device needs to be extended to support ASID. This helps to isolate the environments for the virtqueue that will not be assigned directly. E.g in the case of virtio-net, the control virtqueue will not be assigned directly to guest. As a start, simply claim 1 virtqueue groups and 1 address spaces for all vDPA devices. And vhost-vDPA will simply reject the device with more than 1 virtqueue groups or address spaces. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-7-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: switch to use vhost-vdpa specific IOTLBGautam Dawar2022-05-311-8/+10
| | | | | | | | | | | To ease the implementation of per group ASID support for vDPA device. This patch switches to use a vhost-vdpa specific IOTLB to avoid the unnecessary refactoring of the vhost core. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-5-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: passing iotlb to IOMMU mapping helpersGautam Dawar2022-05-311-30/+37
| | | | | | | | | | | To prepare for the ASID support for vhost-vdpa, try to pass IOTLB object to dma helpers. No functional changes, it's just a preparation for support multiple IOTLBs. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-4-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Fix double fget() in vhost_net_set_backend()Al Viro2022-05-181-8/+7
| | | | | | | | | | | | | | | | | | Descriptor table is a shared resource; two fget() on the same descriptor may return different struct file references. get_tap_ptr_ring() is called after we'd found (and pinned) the socket we'll be using and it tries to find the private tun/tap data structures associated with it. Redoing the lookup by the same file descriptor we'd used to get the socket is racy - we need to same struct file. Thanks to Jason for spotting a braino in the original variant of patch - I'd missed the use of fd == -1 for disabling backend, and in that case we can end up with sock == NULL and sock != oldsock. Cc: stable@kernel.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2022-03-313-7/+47
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - vdpa generic device type support - more virtio hardening for broken devices (but on the same theme, revert some virtio hotplug hardening patches - they were misusing some interrupt flags and had to be reverted) - RSS support in virtio-net - max device MTU support in mlx5 vdpa - akcipher support in virtio-crypto - shared IRQ support in ifcvf vdpa - a minor performance improvement in vhost - enable virtio mem for ARM64 - beginnings of advance dma support - cleanups, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (33 commits) vdpa/mlx5: Avoid processing works if workqueue was destroyed vhost: handle error while adding split ranges to iotlb vdpa: support exposing the count of vqs to userspace vdpa: change the type of nvqs to u32 vdpa: support exposing the config size to userspace vdpa/mlx5: re-create forwarding rules after mac modified virtio: pci: check bar values read from virtio config space Revert "virtio_pci: harden MSI-X interrupts" Revert "virtio-pci: harden INTX interrupts" drivers/net/virtio_net: Added RSS hash report control. drivers/net/virtio_net: Added RSS hash report. drivers/net/virtio_net: Added basic RSS support. drivers/net/virtio_net: Fixed padded vheader to use v1 with hash. virtio: use virtio_device_ready() in virtio_device_restore() tools/virtio: compile with -pthread tools/virtio: fix after premapped buf support virtio_ring: remove flags check for unmap packed indirect desc virtio_ring: remove flags check for unmap split indirect desc virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed() net/mlx5: Add support for configuring max device MTU ...
| * vhost: handle error while adding split ranges to iotlbAnirudh Rayabharam2022-03-281-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | vhost_iotlb_add_range_ctx() handles the range [0, ULONG_MAX] by splitting it into two ranges and adding them separately. The return value of adding the first range to the iotlb is currently ignored. Check the return value and bail out in case of an error. Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20220312141121.4981-1-mail@anirudhrb.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
| * vdpa: support exposing the count of vqs to userspaceLongpeng2022-03-281-0/+13
| | | | | | | | | | | | | | | | | | | | | | - GET_VQS_COUNT: the count of virtqueues that exposed Signed-off-by: Longpeng <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220315032553.455-4-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Longpeng &lt;<a href="mailto:longpeng2@huawei.com" target="_blank">longpeng2@huawei.com</a>&gt;<br> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
| * vdpa: change the type of nvqs to u32Longpeng2022-03-281-4/+6
| | | | | | | | | | | | | | | | | | | | | | Change vdpa_device.nvqs and vhost_vdpa.nvqs to use u32 Signed-off-by: Longpeng <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220315032553.455-3-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Longpeng &lt;<a href="mailto:longpeng2@huawei.com" target="_blank">longpeng2@huawei.com</a>&gt;<br></blockquote><div><br></div><div>Acked-by: Jason Wang &lt;<a href="mailto:jasowang@redhat.com">jasowang@redhat.com</a>&gt;</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
| * vdpa: support exposing the config size to userspaceLongpeng2022-03-281-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | - GET_CONFIG_SIZE: return the size of the virtio config space. The size contains the fields which are conditional on feature bits. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Longpeng <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220315032553.455-2-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>