summaryrefslogtreecommitdiffstats
path: root/net/vmw_vsock
Commit message (Collapse)AuthorAgeFilesLines
* vhost/vsock: Use kvmalloc/kvfree for larger packets.Junichi Uekawa2022-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0e3f72931fc47bb81686020cc643cde5d9cd0bb8 ] When copying a large file over sftp over vsock, data size is usually 32kB, and kmalloc seems to fail to try to allocate 32 32kB regions. vhost-5837: page allocation failure: order:4, mode:0x24040c0 Call Trace: [<ffffffffb6a0df64>] dump_stack+0x97/0xdb [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138 [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8 [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19 [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7 [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock] [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost] [<ffffffffb683ddce>] kthread+0xfd/0x105 [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost] [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80 [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 Work around by doing kvmalloc instead. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Junichi Uekawa <uekawa@chromium.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220928064538.667678-1-uekawa@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout()Peilin Ye2022-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | commit a3e7b29e30854ed67be0d17687e744ad0c769c4b upstream. Imagine two non-blocking vsock_connect() requests on the same socket. The first request schedules @connect_work, and after it times out, vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps *socket* state as SS_CONNECTING. Later, the second request returns -EALREADY, meaning the socket "already has a pending connection in progress", even though the first request has already timed out. As suggested by Stefano, fix it by setting *socket* state back to SS_UNCONNECTED, so that the second request will return -ETIMEDOUT. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock: Fix memory leak in vsock_connect()Peilin Ye2022-08-251-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7e97cfed9929eaabc41829c395eb0d1350fccb9d upstream. An O_NONBLOCK vsock_connect() request may try to reschedule @connect_work. Imagine the following sequence of vsock_connect() requests: 1. The 1st, non-blocking request schedules @connect_work, which will expire after 200 jiffies. Socket state is now SS_CONNECTING; 2. Later, the 2nd, blocking request gets interrupted by a signal after a few jiffies while waiting for the connection to be established. Socket state is back to SS_UNCONNECTED, but @connect_work is still pending, and will expire after 100 jiffies. 3. Now, the 3rd, non-blocking request tries to schedule @connect_work again. Since @connect_work is already scheduled, schedule_delayed_work() silently returns. sock_hold() is called twice, but sock_put() will only be called once in vsock_connect_timeout(), causing a memory leak reported by syzbot: BUG: memory leak unreferenced object 0xffff88810ea56a40 (size 1232): comm "syz-executor756", pid 3604, jiffies 4294947681 (age 12.350s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 28 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@............ backtrace: [<ffffffff837c830e>] sk_prot_alloc+0x3e/0x1b0 net/core/sock.c:1930 [<ffffffff837cbe22>] sk_alloc+0x32/0x2e0 net/core/sock.c:1989 [<ffffffff842ccf68>] __vsock_create.constprop.0+0x38/0x320 net/vmw_vsock/af_vsock.c:734 [<ffffffff842ce8f1>] vsock_create+0xc1/0x2d0 net/vmw_vsock/af_vsock.c:2203 [<ffffffff837c0cbb>] __sock_create+0x1ab/0x2b0 net/socket.c:1468 [<ffffffff837c3acf>] sock_create net/socket.c:1519 [inline] [<ffffffff837c3acf>] __sys_socket+0x6f/0x140 net/socket.c:1561 [<ffffffff837c3bba>] __do_sys_socket net/socket.c:1570 [inline] [<ffffffff837c3bba>] __se_sys_socket net/socket.c:1568 [inline] [<ffffffff837c3bba>] __x64_sys_socket+0x1a/0x20 net/socket.c:1568 [<ffffffff84512815>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84512815>] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae <...> Use mod_delayed_work() instead: if @connect_work is already scheduled, reschedule it, and undo sock_hold() to keep the reference count balanced. Reported-and-tested-by: syzbot+b03f55bf128f9a38f064@syzkaller.appspotmail.com Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Co-developed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock: remove vsock from connected table when connect is interrupted by a signalSeth Forshee2022-02-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b9208492fcaecff8f43915529ae34b3bcb03877c upstream. vsock_connect() expects that the socket could already be in the TCP_ESTABLISHED state when the connecting task wakes up with a signal pending. If this happens the socket will be in the connected table, and it is not removed when the socket state is reset. In this situation it's common for the process to retry connect(), and if the connection is successful the socket will be added to the connected table a second time, corrupting the list. Prevent this by calling vsock_remove_connected() if a signal is received while waiting for a connection. This is harmless if the socket is not in the connected table, and if it is in the table then removing it will prevent list corruption from a double add. Note for backporting: this patch requires d5afa82c977e ("vsock: correct removal of socket from the list"), which is in all current stable trees except 4.9.y. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock: prevent unnecessary refcnt inc for nonblocking connectEiichi Tsukata2021-11-261-0/+2
| | | | | | | | | | | | | | | | | | [ Upstream commit c7cd82b90599fa10915f41e3dd9098a77d0aa7b6 ] Currently vosck_connect() increments sock refcount for nonblocking socket each time it's called, which can lead to memory leak if it's called multiple times because connect timeout function decrements sock refcount only once. Fixes it by making vsock_connect() return -EALREADY immediately when sock state is already SS_CONNECTING. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock/virtio: avoid potential deadlock when vsock device removeLongpeng(Mike)2021-08-261-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 49b0b6ffe20c5344f4173f3436298782a08da4f2 ] There's a potential deadlock case when remove the vsock device or process the RESET event: vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock) lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic". Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci. Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko") Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock: notify server to shutdown when client has pending signalLongpeng(Mike)2021-07-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c7ff9cff70601ea19245d997bb977344663434c7 ] The client's sk_state will be set to TCP_ESTABLISHED if the server replay the client's connect request. However, if the client has pending signal, its sk_state will be set to TCP_CLOSE without notify the server, so the server will hold the corrupt connection. client server 1. sk_state=TCP_SYN_SENT | 2. call ->connect() | 3. wait reply | | 4. sk_state=TCP_ESTABLISHED | 5. insert to connected list | 6. reply to the client 7. sk_state=TCP_ESTABLISHED | 8. insert to connected list | 9. *signal pending* <--------------------- the user kill client 10. sk_state=TCP_CLOSE | client is exiting... | 11. call ->release() | virtio_transport_close if (!(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_CLOSING)) return true; *return at here, the server cannot notice the connection is corrupt* So the client should notify the peer in this case. Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Norbert Slusarek <nslusarek@gmx.net> Cc: Andra Paraschiv <andraprs@amazon.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Brazdil <dbrazdil@google.com> Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lkml.org/lkml/2021/5/17/418 Signed-off-by: lixianming <lixianming5@huawei.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock/vmci: log once the failed queue pair allocationStefano Garzarella2021-05-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e16edc99d658cd41c60a44cc14d170697aa3271f ] VMCI feature is not supported in conjunction with the vSphere Fault Tolerance (FT) feature. VMware Tools can repeatedly try to create a vsock connection. If FT is enabled the kernel logs is flooded with the following messages: qp_alloc_hypercall result = -20 Could not attach to queue pair with -20 "qp_alloc_hypercall result = -20" was hidden by commit e8266c4c3307 ("VMCI: Stop log spew when qp allocation isn't possible"), but "Could not attach to queue pair with -20" is still there flooding the log. Since the error message can be useful in some cases, print it only once. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selinux: vsock: Set SID for socket returned by accept()David Brazdil2021-04-071-0/+1
| | | | | | | | | | | | | | | | | | [ Upstream commit 1f935e8e72ec28dddb2dc0650b3b6626a293d94b ] For AF_VSOCK, accept() currently returns sockets that are unlabelled. Other socket families derive the child's SID from the SID of the parent and the SID of the incoming packet. This is typically done as the connected socket is placed in the queue that accept() removes from. Reuse the existing 'security_sk_clone' hook to copy the SID from the parent (server) socket to the child. There is no packet SID in this case. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock: fix locking in vsock_shutdown()Stefano Garzarella2021-02-232-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | commit 1c5fae9c9a092574398a17facc31c533791ef232 upstream. In vsock_shutdown() we touched some socket fields without holding the socket lock, such as 'state' and 'sk_flags'. Also, after the introduction of multi-transport, we are accessing 'vsk->transport' in vsock_send_shutdown() without holding the lock and this call can be made while the connection is in progress, so the transport can change in the meantime. To avoid issues, we hold the socket lock when we enter in vsock_shutdown() and release it when we leave. Among the transports that implement the 'shutdown' callback, only hyperv_transport acquired the lock. Since the caller now holds it, we no longer take it. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock/virtio: update credit only if socket is not closedStefano Garzarella2021-02-231-2/+2
| | | | | | | | | | | | | | | | | commit ce7536bc7398e2ae552d2fabb7e0e371a9f1fe46 upstream. If the socket is closed or is being released, some resources used by virtio_transport_space_update() such as 'vsk->trans' may be released. To avoid a use after free bug we should only update the available credit when we are sure the socket is still open and we have the lock held. Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20210208144454.84438-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/vmw_vsock: improve locking in vsock_connect_timeout()Norbert Slusarek2021-02-231-4/+1
| | | | | | | | | | | | | | | | | | commit 3d0bc44d39bca615b72637e340317b7899b7f911 upstream. A possible locking issue in vsock_connect_timeout() was recognized by Eric Dumazet which might cause a null pointer dereference in vsock_transport_cancel_pkt(). This patch assures that vsock_transport_cancel_pkt() will be called within the lock, so a race condition won't occur which could result in vsk->transport to be set to NULL. Fixes: 380feae0def7 ("vsock: cancel packets when failing to connect") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Norbert Slusarek <nslusarek@gmx.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/trinity-f8e0937a-cf0e-4d80-a76e-d9a958ba3ef1-1612535522360@3c-app-gmx-bap12 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock: use ns_capable_noaudit() on socket createJeff Vander Stoep2020-11-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit af545bb5ee53f5261db631db2ac4cde54038bdaf ] During __vsock_create() CAP_NET_ADMIN is used to determine if the vsock_sock->trusted should be set to true. This value is used later for determing if a remote connection should be allowed to connect to a restricted VM. Unfortunately, if the caller doesn't have CAP_NET_ADMIN, an audit message such as an selinux denial is generated even if the caller does not want a trusted socket. Logging errors on success is confusing. To avoid this, switch the capable(CAP_NET_ADMIN) check to the noaudit version. Reported-by: Roman Kiryanov <rkir@google.com> https://android-review.googlesource.com/c/device/generic/goldfish/+/1468545/ Signed-off-by: Jeff Vander Stoep <jeffv@google.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/20201023143757.377574-1-jeffv@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: virtio_vsock: Enhance connection semanticsSebastien Boeuf2020-10-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit df12eb6d6cd920ab2f0e0a43cd6e1c23a05cea91 ] Whenever the vsock backend on the host sends a packet through the RX queue, it expects an answer on the TX queue. Unfortunately, there is one case where the host side will hang waiting for the answer and might effectively never recover if no timeout mechanism was implemented. This issue happens when the guest side starts binding to the socket, which insert a new bound socket into the list of already bound sockets. At this time, we expect the guest to also start listening, which will trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem occurs if the host side queued a RX packet and triggered an interrupt right between the end of the binding process and the beginning of the listening process. In this specific case, the function processing the packet virtio_transport_recv_pkt() will find a bound socket, which means it will hit the switch statement checking for the sk_state, but the state won't be changed into TCP_LISTEN yet, which leads the code to pick the default statement. This default statement will only free the buffer, while it should also respond to the host side, by sending a packet on its TX queue. In order to simply fix this unfortunate chain of events, it is important that in case the default statement is entered, and because at this stage we know the host side is waiting for an answer, we must send back a packet containing the operation VIRTIO_VSOCK_OP_RST. One could say that a proper timeout mechanism on the host side will be enough to avoid the backend to hang. But the point of this patch is to ensure the normal use case will be provided with proper responsiveness when it comes to establishing the connection. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock()Stefano Garzarella2020-10-072-86/+86
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4c7246dc45e2706770d5233f7ce1597a07e069ba ] We are going to add 'struct vsock_sock *' parameter to virtio_transport_get_ops(). In some cases, like in the virtio_transport_reset_no_sock(), we don't have any socket assigned to the packet received, so we can't use the virtio_transport_get_ops(). In order to allow virtio_transport_reset_no_sock() to use the '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport', we add the 'struct virtio_transport *' to it and to its caller: virtio_transport_recv_pkt(). We moved the 'vhost_transport' and 'virtio_transport' definition, to pass their address to the virtio_transport_recv_pkt(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock/virtio: stop workers during the .remove()Stefano Garzarella2020-10-071-1/+50
| | | | | | | | | | | | | | | | | [ Upstream commit 17dd1367389cfe7f150790c83247b68e0c19d106 ] Before to call vdev->config->reset(vdev) we need to be sure that no one is accessing the device, for this reason, we add new variables in the struct virtio_vsock to stop the workers during the .remove(). This patch also add few comments before vdev->config->reset(vdev) and vdev->config->del_vqs(vdev). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsockStefano Garzarella2020-10-071-24/+46
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9c7a5582f5d720dc35cfcc42ccaded69f0642e4a ] Some callbacks used by the upper layers can run while we are in the .remove(). A potential use-after-free can happen, because we free the_virtio_vsock without knowing if the callbacks are over or not. To solve this issue we move the assignment of the_virtio_vsock at the end of .probe(), when we finished all the initialization, and at the beginning of .remove(), before to release resources. For the same reason, we do the same also for the vdev->priv. We use RCU to be sure that all callbacks that use the_virtio_vsock ended before freeing it. This is not required for callbacks that use vdev->priv, because after the vdev->config->del_vqs() we are sure that they are ended and will no longer be invoked. We also take the mutex during the .remove() to avoid that .probe() can run while we are resetting the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock: fix timeout in vsock_accept()Stefano Garzarella2020-06-101-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit 7e0afbdfd13d1e708fe96e31c46c4897101a6a43 ] The accept(2) is an "input" socket interface, so we should use SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout. So this patch replace sock_sndtimeo() with sock_rcvtimeo() to use the right timeout in the vsock_accept(). Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hv_sock: Remove the accept port restrictionSunil Muthuswamy2020-02-141-59/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c742c59e1fbd022b64d91aa9a0092b3a699d653c ] Currently, hv_sock restricts the port the guest socket can accept connections on. hv_sock divides the socket port namespace into two parts for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF (there are no restrictions on client port namespace). The first part (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted. The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports for the peer (host) socket, once a connection is accepted. This reservation of the port namespace is specific to hv_sock and not known by the generic vsock library (ex: af_vsock). This is problematic because auto-binds/ephemeral ports are handled by the generic vsock library and it has no knowledge of this port reservation and could allocate a port that is not compatible with hv_sock (and legitimately so). The issue hasn't surfaced so far because the auto-bind code of vsock (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and start assigning ports. That will take a large number of iterations to hit 0x7FFFFFFF. But, after the above change to randomize port selection, the issue has started coming up more frequently. There has really been no good reason to have this port reservation logic in hv_sock from the get go. Reserving a local port for peer ports is not how things are handled generally. Peer ports should reflect the peer port. This fixes the issue by lifting the port reservation, and also returns the right peer port. Since the code converts the GUID to the peer port (by using the first 4 bytes), there is a possibility of conflicts, but that seems like a reasonable risk to take, given this is limited to vsock and that only applies to all local sockets. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* VSOCK: bind to random port for VMADDR_PORT_ANYLepton Wu2019-12-051-1/+6
| | | | | | | | | | | | | | | | | [ Upstream commit 8236b08cf50f85bbfaf48910a0b3ee68318b7c4b ] The old code always starts from fixed port for VMADDR_PORT_ANY. Sometimes when VMM crashed, there is still orphaned vsock which is waiting for close timer, then it could cause connection time out for new started VM if they are trying to connect to same port with same guest cid since the new packets could hit that orphaned vsock. We could also fix this by doing more in vhost_vsock_reset_orphans, but any way, it should be better to start from a random local port instead of a fixed one. Signed-off-by: Lepton Wu <ytht.net@gmail.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vhost/vsock: split packets to send using multiple buffersStefano Garzarella2019-12-011-3/+12
| | | | | | | | | | | | | | | | commit 6dbd3e66e7785a2f055bf84d98de9b8fd31ff3f5 upstream. If the packets to sent to the guest are bigger than the buffer available, we can split them, using multiple buffers and fixing the length in the packet header. This is safe since virtio-vsock supports only stream sockets. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock/virtio: fix sock refcnt holding during the shutdownStefano Garzarella2019-11-121-3/+5
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ad8a7220355d39cddce8eac1cea9677333e8b821 ] The "42f5cda5eaf4" commit rightly set SOCK_DONE on peer shutdown, but there is an issue if we receive the SHUTDOWN(RDWR) while the virtio_transport_close_timeout() is scheduled. In this case, when the timeout fires, the SOCK_DONE is already set and the virtio_transport_close_timeout() will not call virtio_transport_reset() and virtio_transport_do_close(). This causes that both sockets remain open and will never be released, preventing the unloading of [virtio|vhost]_transport modules. This patch fixes this issue, calling virtio_transport_reset() and virtio_transport_do_close() when we receive the SHUTDOWN(RDWR) and there is nothing left to read. Fixes: 42f5cda5eaf4 ("vsock/virtio: set SOCK_DONE on peer shutdown") Cc: Stephen Barber <smbarber@chromium.org> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: use skb_queue_empty_lockless() in poll() handlersEric Dumazet2019-11-101-1/+1
| | | | | | | | | | | [ Upstream commit 3ef7cf57c72f32f61e97f8fa401bc39ea1f1a5d4 ] Many poll() handlers are lockless. Using skb_queue_empty_lockless() instead of skb_queue_empty() is more appropriate. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock: Fix a lockdep warning in __vsock_release()Dexuan Cui2019-10-073-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0d9138ffac24cf8b75366ede3a68c951e6dcc575 ] Lockdep is unhappy if two locks from the same class are held. Fix the below warning for hyperv and virtio sockets (vmci socket code doesn't have the issue) by using lock_sock_nested() when __vsock_release() is called recursively: ============================================ WARNING: possible recursive locking detected 5.3.0+ #1 Not tainted -------------------------------------------- server/1795 is trying to acquire lock: ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock] but task is already holding lock: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_VSOCK); lock(sk_lock-AF_VSOCK); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by server/1795: #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0 #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock] stack backtrace: CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1 Call Trace: dump_stack+0x67/0x90 __lock_acquire.cold.67+0xd2/0x20b lock_acquire+0xb5/0x1c0 lock_sock_nested+0x6d/0x90 hvs_release+0x10/0x120 [hv_sock] __vsock_release+0x24/0xf0 [vsock] __vsock_release+0xa0/0xf0 [vsock] vsock_release+0x12/0x30 [vsock] __sock_release+0x37/0xa0 sock_close+0x14/0x20 __fput+0xc1/0x250 task_work_run+0x98/0xc0 do_exit+0x344/0xc60 do_group_exit+0x47/0xb0 get_signal+0x15c/0xc50 do_signal+0x30/0x720 exit_to_usermode_loop+0x50/0xa0 do_syscall_64+0x24e/0x270 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f4184e85f31 Tested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hv_sock: Fix hang when a connection is closedDexuan Cui2019-09-161-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 685703b497bacea8765bb409d6b73455b73c540e ] There is a race condition for an established connection that is being closed by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the 'remove_sock' is false): 1 for the initial value; 1 for the sk being in the bound list; 1 for the sk being in the connected list; 1 for the delayed close_work. After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may* decrease the refcnt to 3. Concurrently, hvs_close_connection() runs in another thread: calls vsock_remove_sock() to decrease the refcnt by 2; call sock_put() to decrease the refcnt to 0, and free the sk; next, the "release_sock(sk)" may hang due to use-after-free. In the above, after hvs_release() finishes, if hvs_close_connection() runs faster than "__vsock_release() -> sock_put(sk)", then there is not any issue, because at the beginning of hvs_close_connection(), the refcnt is still 4. The issue can be resolved if an extra reference is taken when the connection is established. Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock: correct removal of socket from the listSunil Muthuswamy2019-08-041-31/+7
| | | | | | | | | | | | | | | | | commit d5afa82c977ea06f7119058fa0eb8519ea501031 upstream. The current vsock code for removal of socket from the list is both subject to race and inefficient. It takes the lock, checks whether the socket is in the list, drops the lock and if the socket was on the list, deletes it from the list. This is subject to race because as soon as the lock is dropped once it is checked for presence, that condition cannot be relied upon for any decision. It is also inefficient because if the socket is present in the list, it takes the lock twice. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hv_sock: Add support for delayed closeSunil Muthuswamy2019-08-041-31/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | commit a9eeb998c28d5506616426bd3a216bd5735a18b8 upstream. Currently, hvsock does not implement any delayed or background close logic. Whenever the hvsock socket is closed, a FIN is sent to the peer, and the last reference to the socket is dropped, which leads to a call to .destruct where the socket can hang indefinitely waiting for the peer to close it's side. The can cause the user application to hang in the close() call. This change implements proper STREAM(TCP) closing handshake mechanism by sending the FIN to the peer and the waiting for the peer's FIN to arrive for a given timeout. On timeout, it will try to terminate the connection (i.e. a RST). This is in-line with other socket providers such as virtio. This change does not address the hang in the vmbus_hvsock_device_unregister where it waits indefinitely for the host to rescind the channel. That should be taken up as a separate fix. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hvsock: fix epollout hang from race conditionSunil Muthuswamy2019-07-311-33/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cb359b60416701c8bed82fec79de25a144beb893 ] Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will not return even when the hvsock socket is writable, under some race condition. This can happen under the following sequence: - fd = socket(hvsocket) - fd_out = dup(fd) - fd_in = dup(fd) - start a writer thread that writes data to fd_out with a combination of epoll_wait(fd_out, EPOLLOUT) and - start a reader thread that reads data from fd_in with a combination of epoll_wait(fd_in, EPOLLIN) - On the host, there are two threads that are reading/writing data to the hvsocket stack: hvs_stream_has_space hvs_notify_poll_out vsock_poll sock_poll ep_poll Race condition: check for epollout from ep_poll(): assume no writable space in the socket hvs_stream_has_space() returns 0 check for epollin from ep_poll(): assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE) hvs_stream_has_space() will clear the channel pending send size host will not notify the guest because the pending send size has been cleared and so the hvsocket will never mark the socket writable Now, the EPOLLOUT will never return even if the socket write buffer is empty. The fix is to set the pending size to the default size and never change it. This way the host will always notify the guest whenever the writable space is bigger than the pending size. The host is already optimized to *only* notify the guest when the pending size threshold boundary is crossed and not everytime. This change also reduces the cpu usage somewhat since hv_stream_has_space() is in the hotpath of send: vsock_stream_sendmsg()->hv_stream_has_space() Earlier hv_stream_has_space was setting/clearing the pending size on every call. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock/virtio: set SOCK_DONE on peer shutdownStephen Barber2019-06-221-1/+3
| | | | | | | | | | | | | | | | [ Upstream commit 42f5cda5eaf4396a939ae9bb43bb8d1d09c1b15c ] Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has shut down and there is nothing left to read. This fixes the following bug: 1) Peer sends SHUTDOWN(RDWR). 2) Socket enters TCP_CLOSING but SOCK_DONE is not set. 3) read() returns -ENOTCONN until close() is called, then returns 0. Signed-off-by: Stephen Barber <smbarber@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock/virtio: Initialize core virtio vsock before registering the driverJorge E. Moreira2019-05-251-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ba95e5dfd36647622d8897a2a0470dde60e59ffd ] Avoid a race in which static variables in net/vmw_vsock/af_vsock.c are accessed (while handling interrupts) before they are initialized. [ 4.201410] BUG: unable to handle kernel paging request at ffffffffffffffe8 [ 4.207829] IP: vsock_addr_equals_addr+0x3/0x20 [ 4.211379] PGD 28210067 P4D 28210067 PUD 28212067 PMD 0 [ 4.211379] Oops: 0000 [#1] PREEMPT SMP PTI [ 4.211379] Modules linked in: [ 4.211379] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 4.14.106-419297-gd7e28cc1f241 #1 [ 4.211379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 4.211379] Workqueue: virtio_vsock virtio_transport_rx_work [ 4.211379] task: ffffa3273d175280 task.stack: ffffaea1800e8000 [ 4.211379] RIP: 0010:vsock_addr_equals_addr+0x3/0x20 [ 4.211379] RSP: 0000:ffffaea1800ebd28 EFLAGS: 00010286 [ 4.211379] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffffffffb94e42f0 [ 4.211379] RDX: 0000000000000400 RSI: ffffffffffffffe0 RDI: ffffaea1800ebdd0 [ 4.211379] RBP: ffffaea1800ebd58 R08: 0000000000000001 R09: 0000000000000001 [ 4.211379] R10: 0000000000000000 R11: ffffffffb89d5d60 R12: ffffaea1800ebdd0 [ 4.211379] R13: 00000000828cbfbf R14: 0000000000000000 R15: ffffaea1800ebdc0 [ 4.211379] FS: 0000000000000000(0000) GS:ffffa3273fd00000(0000) knlGS:0000000000000000 [ 4.211379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.211379] CR2: ffffffffffffffe8 CR3: 000000002820e001 CR4: 00000000001606e0 [ 4.211379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.211379] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4.211379] Call Trace: [ 4.211379] ? vsock_find_connected_socket+0x6c/0xe0 [ 4.211379] virtio_transport_recv_pkt+0x15f/0x740 [ 4.211379] ? detach_buf+0x1b5/0x210 [ 4.211379] virtio_transport_rx_work+0xb7/0x140 [ 4.211379] process_one_work+0x1ef/0x480 [ 4.211379] worker_thread+0x312/0x460 [ 4.211379] kthread+0x132/0x140 [ 4.211379] ? process_one_work+0x480/0x480 [ 4.211379] ? kthread_destroy_worker+0xd0/0xd0 [ 4.211379] ret_from_fork+0x35/0x40 [ 4.211379] Code: c7 47 08 00 00 00 00 66 c7 07 28 00 c7 47 08 ff ff ff ff c7 47 04 ff ff ff ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 8b 47 08 <3b> 46 08 75 0a 8b 47 04 3b 46 04 0f 94 c0 c3 31 c0 c3 90 66 2e [ 4.211379] RIP: vsock_addr_equals_addr+0x3/0x20 RSP: ffffaea1800ebd28 [ 4.211379] CR2: ffffffffffffffe8 [ 4.211379] ---[ end trace f31cc4a2e6df3689 ]--- [ 4.211379] Kernel panic - not syncing: Fatal exception in interrupt [ 4.211379] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 4.211379] Rebooting in 5 seconds.. Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug") Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Cc: kernel-team@android.com Cc: stable@vger.kernel.org [4.9+] Signed-off-by: Jorge E. Moreira <jemoreira@google.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock/virtio: free packets during the socket releaseStefano Garzarella2019-05-251-0/+7
| | | | | | | | | | | | [ Upstream commit ac03046ece2b158ebd204dfc4896fd9f39f0e6c8 ] When the socket is released, we should free all packets queued in the per-socket list in order to avoid a memory leak. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock/virtio: fix kernel panic from virtio_transport_reset_no_sockAdalbert Lazăr2019-05-021-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4c404ce23358d5d8fbdeb7a6021a9b33d3c3c167 ] Previous to commit 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug"), vsock_core_init() was called from virtio_vsock_probe(). Now, virtio_transport_reset_no_sock() can be called before vsock_core_init() has the chance to run. [Wed Feb 27 14:17:09 2019] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 [Wed Feb 27 14:17:09 2019] #PF error: [normal kernel read fault] [Wed Feb 27 14:17:09 2019] PGD 0 P4D 0 [Wed Feb 27 14:17:09 2019] Oops: 0000 [#1] SMP PTI [Wed Feb 27 14:17:09 2019] CPU: 3 PID: 59 Comm: kworker/3:1 Not tainted 5.0.0-rc7-390-generic-hvi #390 [Wed Feb 27 14:17:09 2019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [Wed Feb 27 14:17:09 2019] Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport] [Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 [vmw_vsock_virtio_transport_common] [Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28 [Wed Feb 27 14:17:09 2019] RSP: 0018:ffffb42701ab7d40 EFLAGS: 00010282 [Wed Feb 27 14:17:09 2019] RAX: 0000000000000000 RBX: ffff9d79637ee080 RCX: 0000000000000003 [Wed Feb 27 14:17:09 2019] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9d79637ee080 [Wed Feb 27 14:17:09 2019] RBP: ffffb42701ab7d78 R08: ffff9d796fae70e0 R09: ffff9d796f403500 [Wed Feb 27 14:17:09 2019] R10: ffffb42701ab7d90 R11: 0000000000000000 R12: ffff9d7969d09240 [Wed Feb 27 14:17:09 2019] R13: ffff9d79624e6840 R14: ffff9d7969d09318 R15: ffff9d796d48ff80 [Wed Feb 27 14:17:09 2019] FS: 0000000000000000(0000) GS:ffff9d796fac0000(0000) knlGS:0000000000000000 [Wed Feb 27 14:17:09 2019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed Feb 27 14:17:09 2019] CR2: 0000000000000110 CR3: 0000000427f22000 CR4: 00000000000006e0 [Wed Feb 27 14:17:09 2019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [Wed Feb 27 14:17:09 2019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [Wed Feb 27 14:17:09 2019] Call Trace: [Wed Feb 27 14:17:09 2019] virtio_transport_recv_pkt+0x63/0x820 [vmw_vsock_virtio_transport_common] [Wed Feb 27 14:17:09 2019] ? kfree+0x17e/0x190 [Wed Feb 27 14:17:09 2019] ? detach_buf_split+0x145/0x160 [Wed Feb 27 14:17:09 2019] ? __switch_to_asm+0x40/0x70 [Wed Feb 27 14:17:09 2019] virtio_transport_rx_work+0xa0/0x106 [vmw_vsock_virtio_transport] [Wed Feb 27 14:17:09 2019] NET: Registered protocol family 40 [Wed Feb 27 14:17:09 2019] process_one_work+0x167/0x410 [Wed Feb 27 14:17:09 2019] worker_thread+0x4d/0x460 [Wed Feb 27 14:17:09 2019] kthread+0x105/0x140 [Wed Feb 27 14:17:09 2019] ? rescuer_thread+0x360/0x360 [Wed Feb 27 14:17:09 2019] ? kthread_destroy_worker+0x50/0x50 [Wed Feb 27 14:17:09 2019] ret_from_fork+0x35/0x40 [Wed Feb 27 14:17:09 2019] Modules linked in: vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common input_leds vsock serio_raw i2c_piix4 mac_hid qemu_fw_cfg autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net psmouse drm net_failover pata_acpi virtio_blk failover floppy Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug") Reported-by: Alexandru Herghelegiu <aherghelegiu@bitdefender.com> Signed-off-by: Adalbert Lazăr <alazar@bitdefender.com> Co-developed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock/virtio: reset connected sockets on device removalStefano Garzarella2019-03-131-0/+3
| | | | | | | | | | | | [ Upstream commit 85965487abc540368393a15491e6e7fcd230039d ] When the virtio transport device disappear, we should reset all connected sockets in order to inform the users. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock/virtio: fix kernel panic after device hot-unplugStefano Garzarella2019-03-131-8/+18
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 22b5c0b63f32568e130fa2df4ba23efce3eb495b ] virtio_vsock_remove() invokes the vsock_core_exit() also if there are opened sockets for the AF_VSOCK protocol family. In this way the vsock "transport" pointer is set to NULL, triggering the kernel panic at the first socket activity. This patch move the vsock_core_init()/vsock_core_exit() in the virtio_vsock respectively in module_init and module_exit functions, that cannot be invoked until there are open sockets. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1609699 Reported-by: Yan Fu <yafu@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* vsock: cope with memory allocation failure at socket creation timePaolo Abeni2019-02-231-0/+4
| | | | | | | | | | | | | | | | | | | [ Upstream commit 225d9464268599a5b4d094d02ec17808e44c7553 ] In the unlikely event that the kmalloc call in vmci_transport_socket_init() fails, we end-up calling vmci_transport_destruct() with a NULL vmci_trans() and oopsing. This change addresses the above explicitly checking for zero vmci_trans() at destruction time. Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* VSOCK: Send reset control packet when socket is partially boundJorgen Hansen2019-01-091-17/+50
| | | | | | | | | | | | | | | | | | | | [ Upstream commit a915b982d8f5e4295f64b8dd37ce753874867e88 ] If a server side socket is bound to an address, but not in the listening state yet, incoming connection requests should receive a reset control packet in response. However, the function used to send the reset silently drops the reset packet if the sending socket isn't bound to a remote address (as is the case for a bound socket not yet in the listening state). This change fixes this by using the src of the incoming packet as destination for the reset packet in this case. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vsock: split dwork to avoid reinitializationsCong Wang2018-08-072-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | syzbot reported that we reinitialize an active delayed work in vsock_stream_connect(): ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414 WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326 The pattern is apparently wrong, we should only initialize the dealyed work once and could repeatly schedule it. So we have to move out the initializations to allocation side. And to avoid confusion, we can split the shared dwork into two, instead of re-using the same one. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com> Cc: Andy king <acking@vmware.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLLLinus Torvalds2018-06-281-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VSOCK: fix loopback on big-endian systemsClaudio Imbrenda2018-06-221-1/+1
| | | | | | | | | | | | | | The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be used, and in fact in virtio_transport_common.c only 64 bit accessors are used. Using 32 bit accessors for 64 bit values breaks big endian systems. This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt. Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport") Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/vmw_vsock: convert to ->poll_maskChristoph Hellwig2018-05-261-13/+6
| | | | Signed-off-by: Christoph Hellwig <hch@lst.de>
* VSOCK: make af_vsock.ko removable againStefan Hajnoczi2018-04-171-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit c1eef220c1760762753b602c382127bfccee226d ("vsock: always call vsock_init_tables()") introduced a module_init() function without a corresponding module_exit() function. Modules with an init function can only be removed if they also have an exit function. Therefore the vsock module was considered "permanent" and could not be removed. This patch adds an empty module_exit() function so that "rmmod vsock" works. No explicit cleanup is required because: 1. Transports call vsock_core_exit() upon exit and cannot be removed while sockets are still alive. 2. vsock_diag.ko does not perform any action that requires cleanup by vsock.ko. Fixes: c1eef220c176 ("vsock: always call vsock_init_tables()") Reported-by: Xiumei Mu <xmu@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: make getname() functions return length rather than use int* parameterDenys Vlasenko2018-02-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds2018-02-111-15/+15
| | | | | | | | | | | | | | | | | | | | | | | This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'misc.poll' of ↵Linus Torvalds2018-01-301-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull poll annotations from Al Viro: "This introduces a __bitwise type for POLL### bitmap, and propagates the annotations through the tree. Most of that stuff is as simple as 'make ->poll() instances return __poll_t and do the same to local variables used to hold the future return value'. Some of the obvious brainos found in process are fixed (e.g. POLLIN misspelled as POLL_IN). At that point the amount of sparse warnings is low and most of them are for genuine bugs - e.g. ->poll() instance deciding to return -EINVAL instead of a bitmap. I hadn't touched those in this series - it's large enough as it is. Another problem it has caught was eventpoll() ABI mess; select.c and eventpoll.c assumed that corresponding POLL### and EPOLL### were equal. That's true for some, but not all of them - EPOLL### are arch-independent, but POLL### are not. The last commit in this series separates userland POLL### values from the (now arch-independent) kernel-side ones, converting between them in the few places where they are copied to/from userland. AFAICS, this is the least disruptive fix preserving poll(2) ABI and making epoll() work on all architectures. As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and it will trigger only on what would've triggered EPOLLWRBAND on other architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered at all on sparc. With this patch they should work consistently on all architectures" * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) make kernel-side POLL... arch-independent eventpoll: no need to mask the result of epi_item_poll() again eventpoll: constify struct epoll_event pointers debugging printk in sg_poll() uses %x to print POLL... bitmap annotate poll(2) guts 9p: untangle ->poll() mess ->si_band gets POLL... bitmap stored into a user-visible long field ring_buffer_poll_wait() return value used as return value of ->poll() the rest of drivers/*: annotate ->poll() instances media: annotate ->poll() instances fs: annotate ->poll() instances ipc, kernel, mm: annotate ->poll() instances net: annotate ->poll() instances apparmor: annotate ->poll() instances tomoyo: annotate ->poll() instances sound: annotate ->poll() instances acpi: annotate ->poll() instances crypto: annotate ->poll() instances block: annotate ->poll() instances x86: annotate ->poll() instances ...
| * net: annotate ->poll() instancesAl Viro2017-11-271-2/+2
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSINGStefan Hajnoczi2018-01-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | select(2) with wfds but no rfds must return when the socket is shut down by the peer. This way userspace notices socket activity and gets -EPIPE from the next write(2). Currently select(2) does not return for virtio-vsock when a SEND+RCV shutdown packet is received. This is because vsock_poll() only sets POLLOUT | POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the socket is in when the shutdown is received. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | VSOCK: fix outdated sk_state value in hvs_release()Stefan Hajnoczi2017-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 3b4477d2dcf2709d0be89e2a8dced3d0f4a017f2 ("VSOCK: use TCP state constants for sk_state") VSOCK has used TCP_* constants for sk_state. Commit b4562ca7925a3bedada87a3dd072dd5bad043288 ("hv_sock: add locking in the open/close/release code paths") reintroduced the SS_DISCONNECTING constant. This patch replaces the old SS_DISCONNECTING with the new TCP_CLOSING constant. CC: Dexuan Cui <decui@microsoft.com> CC: Cathy Avery <cavery@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | VSOCK: Don't set sk_state to TCP_CLOSE before testing itJorgen Hansen2017-11-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | A recent commit (3b4477d2dcf2) converted the sk_state to use TCP constants. In that change, vmci_transport_handle_detach was changed such that sk->sk_state was set to TCP_CLOSE before we test whether it is TCP_SYN_SENT. This change moves the sk_state change back to the original locations in that function. Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | VSOCK: Don't call vsock_stream_has_data in atomic contextJorgen Hansen2017-11-261-3/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | | When using the host personality, VMCI will grab a mutex for any queue pair access. In the detach callback for the vmci vsock transport, we call vsock_stream_has_data while holding a spinlock, and vsock_stream_has_data will access a queue pair. To avoid this, we can simply omit calling vsock_stream_has_data for host side queue pairs, since the QPs are empty per default when the guest has detached. This bug affects users of VMware Workstation using kernel version 4.4 and later. Testing: Ran vsock tests between guest and host, and verified that with this change, the host isn't calling vsock_stream_has_data during detach. Ran mixedTest between guest and host using both guest and host as server. v2: Rebased on top of recent change to sk_state values Reviewed-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-11-041-0/+1
|\ | | | | | | | | | | | | Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>