summaryrefslogtreecommitdiffstats
path: root/net/socket.c
Commit message (Collapse)AuthorAgeFilesLines
* net: Save and restore msg_namelen in sock_sendmsgMarc Dionne2024-01-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 01b2885d9415152bcb12ff1f7788f500a74ea0ed ] Commit 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") made sock_sendmsg save the incoming msg_name pointer and restore it before returning, to insulate the caller against msg_name being changed by the called code. If the address length was also changed however, we may return with an inconsistent structure where the length doesn't match the address, and attempts to reuse it may lead to lost packets. For example, a kernel that doesn't have commit 1c5950fc6fe9 ("udp6: fix potential access to stale information") will replace a v4 mapped address with its ipv4 equivalent, and shorten namelen accordingly from 28 to 16. If the caller attempts to reuse the resulting msg structure, it will have the original ipv6 (v4 mapped) address but an incorrect v4 length. Fixes: 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: prevent address rewrite in kernel_bind()Jordan Rife2023-10-251-1/+5
| | | | | | | | | | | | | | | | | | | | | commit c889a99a21bf124c3db08d09df919f0eccc5ea4c upstream. Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change 1) Makes a copy of the bind address in kernel_bind() to insulate callers. 2) Replaces direct calls to sock->ops->bind() in net with kernel_bind() Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: prevent rewrite of msg_name in sock_sendmsg()Jordan Rife2023-10-251-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 86a7e0b69bd5b812e48a20c66c2161744f3caa16 ] Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel space may observe their value of msg_name change in cases where BPF sendmsg hooks rewrite the send address. This has been confirmed to break NFS mounts running in UDP mode and has the potential to break other systems. This patch: 1) Creates a new function called __sock_sendmsg() with same logic as the old sock_sendmsg() function. 2) Replaces calls to sock_sendmsg() made by __sys_sendto() and __sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy, as these system calls are already protected. 3) Modifies sock_sendmsg() so that it makes a copy of msg_name if present before passing it down the stack to insulate callers from changes to the send address. Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: fix kernel-doc warnings for socket.cRandy Dunlap2023-10-251-17/+17
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 85806af0c6bac0feb777e255a25fd5d0cf6ad38e ] Fix kernel-doc warnings by moving the kernel-doc notation to be immediately above the functions that it describes. Fixes these warnings for sock_sendmsg() and sock_recvmsg(): ../net/socket.c:658: warning: Excess function parameter 'sock' description in 'INDIRECT_CALLABLE_DECLARE' ../net/socket.c:658: warning: Excess function parameter 'msg' description in 'INDIRECT_CALLABLE_DECLARE' ../net/socket.c:889: warning: Excess function parameter 'sock' description in 'INDIRECT_CALLABLE_DECLARE' ../net/socket.c:889: warning: Excess function parameter 'msg' description in 'INDIRECT_CALLABLE_DECLARE' ../net/socket.c:889: warning: Excess function parameter 'flags' description in 'INDIRECT_CALLABLE_DECLARE' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: use indirect calls helpers at the socket layerPaolo Abeni2023-10-251-4/+16
| | | | | | | | | | | | [ Upstream commit 8c3c447b3cec27cf6f77080f4d157d53b64e9555 ] This avoids an indirect call per {send,recv}msg syscall in the common (IPv6 or IPv4 socket) case. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: Avoid address overwrite in kernel_connectJordan Rife2023-09-231-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0bdf399342c5acbd817c9098b6c7ed21f1974312 upstream. BPF programs that run on connect can rewrite the connect address. For the connect system call this isn't a problem, because a copy of the address is made when it is moved into kernel space. However, kernel_connect simply passes through the address it is given, so the caller may observe its address value unexpectedly change. A practical example where this is problematic is where NFS is combined with a system such as Cilium which implements BPF-based load balancing. A common pattern in software-defined storage systems is to have an NFS mount that connects to a persistent virtual IP which in turn maps to an ephemeral server IP. This is usually done to achieve high availability: if your server goes down you can quickly spin up a replacement and remap the virtual IP to that endpoint. With BPF-based load balancing, mounts will forget the virtual IP address when the address rewrite occurs because a pointer to the only copy of that address is passed down the stack. Server failover then breaks, because clients have forgotten the virtual IP address. Reconnects fail and mounts remain broken. This patch was tested by setting up a scenario like this and ensuring that NFS reconnects worked after applying the patch. Signed-off-by: Jordan Rife <jrife@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: annotate sk->sk_err write from do_recvmmsg()Eric Dumazet2023-05-301-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit e05a5f510f26607616fecdd4ac136310c8bea56b ] do_recvmmsg() can write to sk->sk_err from multiple threads. As said before, many other points reading or writing sk_err need annotations. Fixes: 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: Fix a data-race around sysctl_somaxconn.Kuniyuki Iwashima2022-09-051-1/+1
| | | | | | | | | | | | [ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ] While reading sysctl_somaxconn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: don't unconditionally copy_from_user a struct ifreq for socket ioctlsPeter Collingbourne2021-09-031-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d0efb16294d145d157432feda83877ae9d7cdf37 upstream. A common implementation of isatty(3) involves calling a ioctl passing a dummy struct argument and checking whether the syscall failed -- bionic and glibc use TCGETS (passing a struct termios), and musl uses TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will copy sizeof(struct ifreq) bytes of data from the argument and return -EFAULT if that fails. The result is that the isatty implementations may return a non-POSIX-compliant value in errno in the case where part of the dummy struct argument is inaccessible, as both struct termios and struct winsize are smaller than struct ifreq (at least on arm64). Although there is usually enough stack space following the argument on the stack that this did not present a practical problem up to now, with MTE stack instrumentation it's more likely for the copy to fail, as the memory following the struct may have a different tag. Fix the problem by adding an early check for whether the ioctl is a valid socket ioctl, and return -ENOTTY if it isn't. Fixes: 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers") Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d Signed-off-by: Peter Collingbourne <pcc@google.com> Cc: <stable@vger.kernel.org> # 4.19 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: make get_net_ns return error if NET_NS is disabledChangbin Du2021-06-301-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ea6932d70e223e02fea3ae20a4feff05d7c1ea9a ] There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled. The reason is that nsfs tries to access ns->ops but the proc_ns_operations is not implemented in this case. [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [7.670268] pgd = 32b54000 [7.670544] [00000010] *pgd=00000000 [7.671861] Internal error: Oops: 5 [#1] SMP ARM [7.672315] Modules linked in: [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16 [7.673309] Hardware name: Generic DT based system [7.673642] PC is at nsfs_evict+0x24/0x30 [7.674486] LR is at clear_inode+0x20/0x9c The same to tun SIOCGSKNS command. To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is disabled. Meanwhile move it to right place net/core/net_namespace.c. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: c62cce2caee5 ("net: add an ioctl to get a socket network namespace") Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: add documentation to socket.cPedro Tammela2021-06-301-18/+259
| | | | | | | | | | | | | | [ Upstream commit 8a3c245c031944f2176118270e7bc5d4fd4a1075 ] Adds missing sphinx documentation to the socket.c's functions. Also fixes some whitespaces. I also changed the style of older documentation as an effort to have an uniform documentation style. Signed-off-by: Pedro Tammela <pctammela@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: Set fput_needed iff FDPUT_FPUT is setMiaohe Lin2020-08-191-1/+1
| | | | | | | | | | | | [ Upstream commit ce787a5a074a86f76f5d3fd804fa78e01bfb9e89 ] We should fput() file iff FDPUT_FPUT is set. So we should set fput_needed accordingly. Fixes: 00e188ef6a7e ("sockfd_lookup_light(): switch to fdget^W^Waway from fget_light") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* compat_ioctl: handle SIOCOUTQNSDArnd Bergmann2020-01-171-0/+1
| | | | | | | | | | | | | | | | commit 9d7bf41fafa5b5ddd4c13eb39446b0045f0a8167 upstream. Unlike the normal SIOCOUTQ, SIOCOUTQNSD was never handled in compat mode. Add it to the common socket compat handler along with similar ones. Fixes: 2f4e1b397097 ("tcp: ioctl type SIOCOUTQNSD returns amount of data not sent") Cc: Eric Dumazet <edumazet@google.com> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: make socket read/write_iter() honor IOCB_NOWAITJens Axboe2020-01-091-2/+2
| | | | | | | | | | | | | | [ Upstream commit ebfcd8955c0b52eb793bcbc9e71140e3d0cdb228 ] The socket read/write helpers only look at the file O_NONBLOCK. not the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2 and io_uring that rely on not having the file itself marked nonblocking, but rather the iocb itself. Cc: netdev@vger.kernel.org Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: socket: set sock->sk to NULL after calling proto_ops::release()Eric Biggers2019-03-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ff7b11aa481f682e0e9711abfeb7d03f5cd612bf ] Commit 9060cb719e61 ("net: crypto set sk to NULL when af_alg_release.") fixed a use-after-free in sockfs_setattr() when an AF_ALG socket is closed concurrently with fchownat(). However, it ignored that many other proto_ops::release() methods don't set sock->sk to NULL and therefore allow the same use-after-free: - base_sock_release - bnep_sock_release - cmtp_sock_release - data_sock_release - dn_release - hci_sock_release - hidp_sock_release - iucv_sock_release - l2cap_sock_release - llcp_sock_release - llc_ui_release - rawsock_release - rfcomm_sock_release - sco_sock_release - svc_release - vcc_release - x25_release Rather than fixing all these and relying on every socket type to get this right forever, just make __sock_release() set sock->sk to NULL itself after calling proto_ops::release(). Reproducer that produces the KASAN splat when any of these socket types are configured into the kernel: #include <pthread.h> #include <stdlib.h> #include <sys/socket.h> #include <unistd.h> pthread_t t; volatile int fd; void *close_thread(void *arg) { for (;;) { usleep(rand() % 100); close(fd); } } int main() { pthread_create(&t, NULL, close_thread, NULL); for (;;) { fd = socket(rand() % 50, rand() % 11, 0); fchownat(fd, "", 1000, 1000, 0x1000); close(fd); } } Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: socket: make bond ioctls go through compat_ifreq_ioctl()Johannes Berg2019-02-271-4/+4
| | | | | | | | | | | | | [ Upstream commit 98406133dd9cb9f195676eab540c270dceca879a ] Same story as before, these use struct ifreq and thus need to be read with the shorter version to not cause faults. Cc: stable@vger.kernel.org Fixes: f92d4fc95341 ("kill bond_ioctl()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: socket: fix SIOCGIFNAME in compatJohannes Berg2019-02-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c6c9fee35dc27362b7bac34b2fc9f5b8ace2e22c ] As reported by Robert O'Callahan in https://bugzilla.kernel.org/show_bug.cgi?id=202273 reverting the previous changes in this area broke the SIOCGIFNAME ioctl in compat again (I'd previously fixed it after his previous report of breakage in https://bugzilla.kernel.org/show_bug.cgi?id=199469). This is obviously because I fixed SIOCGIFNAME more or less by accident. Fix it explicitly now by making it pass through the restored compat translation code. Cc: stable@vger.kernel.org Fixes: 4cf808e7ac32 ("kill dev_ifname32()") Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "kill dev_ifsioc()"Johannes Berg2019-02-271-0/+49
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 37ac39bdddc528c998a9f36db36937de923fdf2a ] This reverts commit bf4405737f9f ("kill dev_ifsioc()"). This wasn't really unused as implied by the original commit, it still handles the copy to/from user differently, and the commit thus caused issues such as https://bugzilla.kernel.org/show_bug.cgi?id=199469 and https://bugzilla.kernel.org/show_bug.cgi?id=202273 However, deviating from a strict revert, rename dev_ifsioc() to compat_ifreq_ioctl() to be clearer as to its purpose and add a comment. Cc: stable@vger.kernel.org Fixes: bf4405737f9f ("kill dev_ifsioc()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "socket: fix struct ifreq size in compat ioctl"Johannes Berg2019-02-271-14/+8
| | | | | | | | | | | | | | | | | [ Upstream commit 63ff03ab786ab1bc6cca01d48eacd22c95b9b3eb ] This reverts commit 1cebf8f143c2 ("socket: fix struct ifreq size in compat ioctl"), it's a bugfix for another commit that I'll revert next. This is not a 'perfect' revert, I'm keeping some coding style intact rather than revert to the state with indentation errors. Cc: stable@vger.kernel.org Fixes: 1cebf8f143c2 ("socket: fix struct ifreq size in compat ioctl") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: socket: fix a missing-check bugWenwen Wang2018-10-181-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch statement to see whether it is necessary to pre-process the ethtool structure, because, as mentioned in the comment, the structure ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc' is allocated through compat_alloc_user_space(). One thing to note here is that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is partially determined by 'rule_cnt', which is actually acquired from the user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through get_user(). After 'rxnfc' is allocated, the data in the original user-space buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(), including the 'rule_cnt' field. However, after this copy, no check is re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user race to change the value in the 'compat_rxnfc->rule_cnt' between these two copies. Through this way, the attacker can bypass the previous check on 'rule_cnt' and inject malicious data. This can cause undefined behavior of the kernel and introduce potential security risk. This patch avoids the above issue via copying the value acquired by get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* socket: fix struct ifreq size in compat ioctlJohannes Berg2018-09-131-8/+14
| | | | | | | | | | | | | | | | | | | | | As reported by Reobert O'Callahan, since Viro's commit to kill dev_ifsioc() we attempt to copy too much data in compat mode, which may lead to EFAULT when the 32-bit version of struct ifreq sits at/near the end of a page boundary, and the next page isn't mapped. Fix this by passing the approprate compat/non-compat size to copy and using that, as before the dev_ifsioc() removal. This works because only the embedded "struct ifmap" has different size, and this is only used in SIOCGIFMAP/SIOCSIFMAP which has a different handler. All other parts of the union are naturally compatible. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199469. Fixes: bf4405737f9f ("kill dev_ifsioc()") Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2018-08-151-11/+17
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Highlights: - Gustavo A. R. Silva keeps working on the implicit switch fallthru changes. - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From Luca Coelho. - Re-enable ASPM in r8169, from Kai-Heng Feng. - Add virtual XFRM interfaces, which avoids all of the limitations of existing IPSEC tunnels. From Steffen Klassert. - Convert GRO over to use a hash table, so that when we have many flows active we don't traverse a long list during accumluation. - Many new self tests for routing, TC, tunnels, etc. Too many contributors to mention them all, but I'm really happy to keep seeing this stuff. - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu. - Lots of cleanups and fixes in L2TP code from Guillaume Nault. - Add IPSEC offload support to netdevsim, from Shannon Nelson. - Add support for slotting with non-uniform distribution to netem packet scheduler, from Yousuk Seung. - Add UDP GSO support to mlx5e, from Boris Pismenny. - Support offloading of Team LAG in NFP, from John Hurley. - Allow to configure TX queue selection based upon RX queue, from Amritha Nambiar. - Support ethtool ring size configuration in aquantia, from Anton Mikaev. - Support DSCP and flowlabel per-transport in SCTP, from Xin Long. - Support list based batching and stack traversal of SKBs, this is very exciting work. From Edward Cree. - Busyloop optimizations in vhost_net, from Toshiaki Makita. - Introduce the ETF qdisc, which allows time based transmissions. IGB can offload this in hardware. From Vinicius Costa Gomes. - Add parameter support to devlink, from Moshe Shemesh. - Several multiplication and division optimizations for BPF JIT in nfp driver, from Jiong Wang. - Lots of prepatory work to make more of the packet scheduler layer lockless, when possible, from Vlad Buslov. - Add ACK filter and NAT awareness to sch_cake packet scheduler, from Toke Høiland-Jørgensen. - Support regions and region snapshots in devlink, from Alex Vesker. - Allow to attach XDP programs to both HW and SW at the same time on a given device, with initial support in nfp. From Jakub Kicinski. - Add TLS RX offload and support in mlx5, from Ilya Lesokhin. - Use PHYLIB in r8169 driver, from Heiner Kallweit. - All sorts of changes to support Spectrum 2 in mlxsw driver, from Ido Schimmel. - PTP support in mv88e6xxx DSA driver, from Andrew Lunn. - Make TCP_USER_TIMEOUT socket option more accurate, from Jon Maxwell. - Support for templates in packet scheduler classifier, from Jiri Pirko. - IPV6 support in RDS, from Ka-Cheong Poon. - Native tproxy support in nf_tables, from Máté Eckl. - Maintain IP fragment queue in an rbtree, but optimize properly for in-order frags. From Peter Oskolkov. - Improvde handling of ACKs on hole repairs, from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits) bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT" hv/netvsc: Fix NULL dereference at single queue mode fallback net: filter: mark expected switch fall-through xen-netfront: fix warn message as irq device name has '/' cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0 net: dsa: mv88e6xxx: missing unlock on error path rds: fix building with IPV6=m inet/connection_sock: prefer _THIS_IP_ to current_text_addr net: dsa: mv88e6xxx: bitwise vs logical bug net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() ieee802154: hwsim: using right kind of iteration net: hns3: Add vlan filter setting by ethtool command -K net: hns3: Set tx ring' tc info when netdev is up net: hns3: Remove tx ring BD len register in hns3_enet net: hns3: Fix desc num set to default when setting channel net: hns3: Fix for phy link issue when using marvell phy driver net: hns3: Fix for information of phydev lost problem when down/up net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Add support for serdes loopback selftest bnxt_en: take coredump_record structure off stack ...
| * net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()Jeremy Cline2018-08-141-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | req->sdiag_family is a user-controlled value that's used as an array index. Sanitize it after the bounds check to avoid speculative out-of-bounds array access. This also protects the sock_is_registered() call, so this removes the sanitize call there. Fixes: e978de7a6d38 ("net: socket: Fix potential spectre v1 gadget in sock_is_registered") Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: konrad.wilk@oracle.com Cc: jamie.iles@oracle.com Cc: liran.alon@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-08-021-1/+4
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | The BTF conflicts were simple overlapping changes. The virtio_net conflict was an overlap of a fix of statistics counter, happening alongisde a move over to a bonafide statistics structure rather than counting value on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: remove bogus RCU annotations on socket.wqChristoph Hellwig2018-07-311-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We never use RCU protection for it, just a lot of cargo-cult rcu_deference_protects calls. Note that we do keep the kfree_rcu call for it, as the references through struct sock are RCU protected and thus might require a grace period before freeing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: remove sock_poll_busy_flagChristoph Hellwig2018-07-301-5/+11
| | | | | | | | | | | | | | | | | | | | | Fold it into the only caller to make the code simpler and easier to read. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: remove sock_poll_busy_loopChristoph Hellwig2018-07-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is no point in hiding this logic in a helper. Also remove the useless events != 0 check and only busy loop once we know we actually have a poll method. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'work.open3' of ↵Linus Torvalds2018-08-131-24/+5
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs open-related updates from Al Viro: - "do we need fput() or put_filp()" rules are gone - it's always fput() now. We keep track of that state where it belongs - in ->f_mode. - int *opened mess killed - in finish_open(), in ->atomic_open() instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open(). - alloc_file() wrappers with saner calling conventions are introduced (alloc_file_clone() and alloc_file_pseudo()); callers converted, with much simplification. - while we are at it, saner calling conventions for path_init() and link_path_walk(), simplifying things inside fs/namei.c (both on open-related paths and elsewhere). * 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) few more cleanups of link_path_walk() callers allow link_path_walk() to take ERR_PTR() make path_init() unconditionally paired with terminate_walk() document alloc_file() changes make alloc_file() static do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone() new helper: alloc_file_clone() create_pipe_files(): switch the first allocation to alloc_file_pseudo() anon_inode_getfile(): switch to alloc_file_pseudo() hugetlb_file_setup(): switch to alloc_file_pseudo() ocxlflash_getfile(): switch to alloc_file_pseudo() cxl_getfile(): switch to alloc_file_pseudo() ... and switch shmem_file_setup() to alloc_file_pseudo() __shmem_file_setup(): reorder allocations new wrapper: alloc_file_pseudo() kill FILE_{CREATED,OPENED} switch atomic_open() and lookup_open() to returning 0 in all success cases document ->atomic_open() changes ->atomic_open(): return 0 in all success cases get rid of 'opened' in path_openat() and the helpers downstream ...
| * | new wrapper: alloc_file_pseudo()Al Viro2018-07-121-23/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | takes inode, vfsmount, name, O_... flags and file_operations and either returns a new struct file (in which case inode reference we held is consumed) or returns ERR_PTR(), in which case no refcounts are altered. converted aio_private_file() and sock_alloc_file() to it Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | alloc_file(): switch to passing O_... flags instead of FMODE_... modeAl Viro2018-07-121-2/+1
| | | | | | | | | | | | | | | | | | | | | ... so that it could set both ->f_flags and ->f_mode, without callers having to set ->f_flags manually. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | net: socket: Fix potential spectre v1 gadget in sock_is_registeredJeremy Cline2018-07-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'family' can be a user-controlled value, so sanitize it after the bounds check to avoid speculative out-of-bounds access. Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: socket: fix potential spectre v1 gadget in socketcallJeremy Cline2018-07-281-0/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | 'call' is a user-controlled value, so sanitize the array index after the bounds check to avoid speculating past the bounds of the 'nargs' array. Found with the help of Smatch: net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue 'nargs' [r] (local cap) Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: handle NULL ->poll gracefullyChristoph Hellwig2018-06-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The big aio poll revert broke various network protocols that don't implement ->poll as a patch in the aio poll serie removed sock_no_poll and made the common code handle this case. Reported-by: syzbot+57727883dbad76db2ef0@syzkaller.appspotmail.com Reported-by: syzbot+cdb0d3176b53d35ad454@syzkaller.appspotmail.com Reported-by: syzbot+2c7e8f74f8b2571c87e8@syzkaller.appspotmail.com Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: a11e1d432b51 ("Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLLLinus Torvalds2018-06-281-43/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* socket: close race condition between sock_close() and sockfs_setattr()Cong Wang2018-06-101-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | fchownat() doesn't even hold refcnt of fd until it figures out fd is really needed (otherwise is ignored) and releases it after it resolves the path. This means sock_close() could race with sockfs_setattr(), which leads to a NULL pointer dereference since typically we set sock->sk to NULL in ->release(). As pointed out by Al, this is unique to sockfs. So we can fix this in socket layer by acquiring inode_lock in sock_close() and checking against NULL in sockfs_setattr(). sock_release() is called in many places, only the sock_close() path matters here. And fortunately, this should not affect normal sock_close() as it is only called when the last fd refcnt is gone. It only affects sock_close() with a parallel sockfs_setattr() in progress, which is not common. Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Reported-by: shankarapailoor <shankarapailoor@gmail.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'next-general' of ↵Linus Torvalds2018-06-061-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security system updates from James Morris: - incorporate new socketpair() hook into LSM and wire up the SELinux and Smack modules. From David Herrmann: "The idea is to allow SO_PEERSEC to be called on AF_UNIX sockets created via socketpair(2), and return the same information as if you emulated socketpair(2) via a temporary listener socket. Right now SO_PEERSEC will return the unlabeled credentials for a socketpair, rather than the actual credentials of the creating process." - remove the unused security_settime LSM hook (Sargun Dhillon). - remove some stack allocated arrays from the keys code (Tycho Andersen) * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: dh key: get rid of stack allocated array for zeroes dh key: get rid of stack allocated array big key: get rid of stack array allocation smack: provide socketpair callback selinux: provide socketpair callback net: hook socketpair() into LSM security: add hook for socketpair() security: remove security_settime
| * net: hook socketpair() into LSMDavid Herrmann2018-05-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | Use the newly created LSM-hook for socketpair(). The default hook return-value is 0, so behavior stays the same unless LSMs start using this hook. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: James Morris <james.morris@microsoft.com>
* | net: add support for ->poll_mask in proto_opsChristoph Hellwig2018-05-261-5/+43
| | | | | | | | | | | | | | The socket file operations still implement ->poll until all protocols are switched over. Signed-off-by: Christoph Hellwig <hch@lst.de>
* | net: refactor socket_pollChristoph Hellwig2018-05-261-17/+4
|/ | | | | | | | Factor out two busy poll related helpers for late reuse, and remove a command that isn't very helpful, especially with the __poll_t annotations in place. Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge branch 'for-linus' of ↵Linus Torvalds2018-04-051-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: kfifo: fix inaccurate comment tools/thermal: tmon: fix for segfault net: Spelling s/stucture/structure/ edd: don't spam log if no EDD information is present Documentation: Fix early-microcode.txt references after file rename tracing: Block comments should align the * on each line treewide: Fix typos in printk GenWQE: Fix a typo in two comments treewide: Align function definition open/close braces
| * net: Spelling s/stucture/structure/Geert Uytterhoeven2018-03-271-1/+1
| | | | | | | | | | Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2018-04-031-25/+26
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Support offloading wireless authentication to userspace via NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari. 2) A lot of work on network namespace setup/teardown from Kirill Tkhai. Setup and cleanup of namespaces now all run asynchronously and thus performance is significantly increased. 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon Streiff. 4) Support zerocopy on RDS sockets, from Sowmini Varadhan. 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel Borkmann. 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime Chevallier. 7) Support grafting of child qdiscs in mlxsw driver, from Nogah Frankel. 8) Add packet forwarding tests to selftests, from Ido Schimmel. 9) Deal with sub-optimal GSO packets better in BBR congestion control, from Eric Dumazet. 10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern. 11) Add path MTU tests to selftests, from Stefano Brivio. 12) Various bits of IPSEC offloading support for mlx5, from Aviad Yehezkel, Yossi Kuperman, and Saeed Mahameed. 13) Support RSS spreading on ntuple filters in SFC driver, from Edward Cree. 14) Lots of sockmap work from John Fastabend. Applications can use eBPF to filter sendmsg and sendpage operations. 15) In-kernel receive TLS support, from Dave Watson. 16) Add XDP support to ixgbevf, this is significant because it should allow optimized XDP usage in various cloud environments. From Tony Nguyen. 17) Add new Intel E800 series "ice" ethernet driver, from Anirudh Venkataramanan et al. 18) IP fragmentation match offload support in nfp driver, from Pieter Jansen van Vuuren. 19) Support XDP redirect in i40e driver, from Björn Töpel. 20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of tracepoints in their raw form, from Alexei Starovoitov. 21) Lots of striding RQ improvements to mlx5 driver with many performance improvements, from Tariq Toukan. 22) Use rhashtable for inet frag reassembly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits) net: mvneta: improve suspend/resume net: mvneta: split rxq/txq init and txq deinit into SW and HW parts ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() net: bgmac: Correctly annotate register space route: check sysctl_fib_multipath_use_neigh earlier than hash fix typo in command value in drivers/net/phy/mdio-bitbang. sky2: Increase D3 delay to sky2 stops working after suspend net/mlx5e: Set EQE based as default TX interrupt moderation mode ibmvnic: Disable irqs before exiting reset from closed state net: sched: do not emit messages while holding spinlock vlan: also check phy_driver ts_info for vlan's real device Bluetooth: Mark expected switch fall-throughs Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME Bluetooth: btrsi: remove unused including <linux/version.h> Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4 sh_eth: kill useless check in __sh_eth_get_regs() sh_eth: add sh_eth_cpu_data::no_xdfar flag ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data() ipv4: factorize sk_wmem_alloc updates done by __ip_append_data() ...
| * \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-03-231-0/+5
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | socket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)Soheil Hassas Yeganeh2018-03-011-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | recvmmsg does not call ___sys_recvmsg when sk_err is set. That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg should always call ___sys_recvmsg regardless of sk->sk_err to be able to clear error queue. Otherwise, users are not able to drain the error queue using recvmmsg. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: make kmem caches as __ro_after_initAlexey Dobriyan2018-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | All kmem caches aren't reallocated once set up. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Make extern and export get_net_ns()Kirill Tkhai2018-02-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This function will be used to obtain net of tun device. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Remove atalk header from socket.cDavid Ahern2018-02-141-1/+0
| | | | | | | | | | | | | | | | | | | | Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: make getname() functions return length rather than use int* parameterDenys Vlasenko2018-02-121-18/+17
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'syscalls-next' of ↵Linus Torvalds2018-04-021-73/+161
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_*() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_*() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...
| * | net: socket: replace call to sys_recv() with __sys_recvfrom()Dominik Brodowski2018-04-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sys_recv() merely expands the parameters to __sys_recvfrom() by NULL and NULL. Open-code this in the two places which used sys_recv() as a wrapper to __sys_recvfrom(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>