summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'master' of ↵David S. Miller2019-07-055-80/+62
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2019-07-05 1) Fix xfrm selector prefix length validation for inter address family tunneling. From Anirudh Gupta. 2) Fix a memleak in pfkey. From Jeremy Sowden. 3) Fix SA selector validation to allow empty selectors again. From Nicolas Dichtel. 4) Select crypto ciphers for xfrm_algo, this fixes some randconfig builds. From Arnd Bergmann. 5) Remove a duplicated assignment in xfrm_bydst_resize. From Cong Wang. 6) Fix a hlist corruption on hash rebuild. From Florian Westphal. 7) Fix a memory leak when creating xfrm interfaces. From Nicolas Dichtel. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * xfrm interface: fix memory leak on creationNicolas Dichtel2019-07-031-70/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commands produce a backtrace and return an error but the xfrm interface is created (in the wrong netns): $ ip netns add foo $ ip netns add bar $ ip -n foo netns set bar 0 $ ip -n foo link add xfrmi0 link-netnsid 0 type xfrm dev lo if_id 23 RTNETLINK answers: Invalid argument $ ip -n bar link ls xfrmi0 2: xfrmi0@lo: <NOARP,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/none 00:00:00:00:00:00 brd 00:00:00:00:00:00 Here is the backtrace: [ 79.879174] WARNING: CPU: 0 PID: 1178 at net/core/dev.c:8172 rollback_registered_many+0x86/0x3c1 [ 79.880260] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace sunrpc fscache button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic ide_cd_mod ide_gd_mod cdrom ata_$ eneric ata_piix libata scsi_mod 8139too piix psmouse i2c_piix4 ide_core 8139cp mii i2c_core floppy [ 79.883698] CPU: 0 PID: 1178 Comm: ip Not tainted 5.2.0-rc6+ #106 [ 79.884462] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 79.885447] RIP: 0010:rollback_registered_many+0x86/0x3c1 [ 79.886120] Code: 01 e8 d7 7d c6 ff 0f 0b 48 8b 45 00 4c 8b 20 48 8d 58 90 49 83 ec 70 48 8d 7b 70 48 39 ef 74 44 8a 83 d0 04 00 00 84 c0 75 1f <0f> 0b e8 61 cd ff ff 48 b8 00 01 00 00 00 00 ad de 48 89 43 70 66 [ 79.888667] RSP: 0018:ffffc900015ab740 EFLAGS: 00010246 [ 79.889339] RAX: ffff8882353e5700 RBX: ffff8882353e56a0 RCX: ffff8882353e5710 [ 79.890174] RDX: ffffc900015ab7e0 RSI: ffffc900015ab7e0 RDI: ffff8882353e5710 [ 79.891029] RBP: ffffc900015ab7e0 R08: ffffc900015ab7e0 R09: ffffc900015ab7e0 [ 79.891866] R10: ffffc900015ab7a0 R11: ffffffff82233fec R12: ffffc900015ab770 [ 79.892728] R13: ffffffff81eb7ec0 R14: ffff88822ed6cf00 R15: 00000000ffffffea [ 79.893557] FS: 00007ff350f31740(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 [ 79.894581] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 79.895317] CR2: 00000000006c8580 CR3: 000000022c272000 CR4: 00000000000006f0 [ 79.896137] Call Trace: [ 79.896464] unregister_netdevice_many+0x12/0x6c [ 79.896998] __rtnl_newlink+0x6e2/0x73b [ 79.897446] ? __kmalloc_node_track_caller+0x15e/0x185 [ 79.898039] ? pskb_expand_head+0x5f/0x1fe [ 79.898556] ? stack_access_ok+0xd/0x2c [ 79.899009] ? deref_stack_reg+0x12/0x20 [ 79.899462] ? stack_access_ok+0xd/0x2c [ 79.899927] ? stack_access_ok+0xd/0x2c [ 79.900404] ? __module_text_address+0x9/0x4f [ 79.900910] ? is_bpf_text_address+0x5/0xc [ 79.901390] ? kernel_text_address+0x67/0x7b [ 79.901884] ? __kernel_text_address+0x1a/0x25 [ 79.902397] ? unwind_get_return_address+0x12/0x23 [ 79.903122] ? __cmpxchg_double_slab.isra.37+0x46/0x77 [ 79.903772] rtnl_newlink+0x43/0x56 [ 79.904217] rtnetlink_rcv_msg+0x200/0x24c In fact, each time a xfrm interface was created, a netdev was allocated by __rtnl_newlink()/rtnl_create_link() and then another one by xfrmi_newlink()/xfrmi_create(). Only the second one was registered, it's why the previous commands produce a backtrace: dev_change_net_namespace() was called on a netdev with reg_state set to NETREG_UNINITIALIZED (the first one). CC: Lorenzo Colitti <lorenzo@google.com> CC: Benedict Wong <benedictwong@google.com> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Shannon Nelson <shannon.nelson@oracle.com> CC: Antony Antony <antony@phenome.org> CC: Eyal Birger <eyal.birger@gmail.com> Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Reported-by: Julien Floret <julien.floret@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: policy: fix bydst hlist corruption on hash rebuildFlorian Westphal2019-07-031-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported following spat: BUG: KASAN: use-after-free in __write_once_size include/linux/compiler.h:221 BUG: KASAN: use-after-free in hlist_del_rcu include/linux/rculist.h:455 BUG: KASAN: use-after-free in xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318 Write of size 8 at addr ffff888095e79c00 by task kworker/1:3/8066 Workqueue: events xfrm_hash_rebuild Call Trace: __write_once_size include/linux/compiler.h:221 [inline] hlist_del_rcu include/linux/rculist.h:455 [inline] xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318 process_one_work+0x814/0x1130 kernel/workqueue.c:2269 Allocated by task 8064: __kmalloc+0x23c/0x310 mm/slab.c:3669 kzalloc include/linux/slab.h:742 [inline] xfrm_hash_alloc+0x38/0xe0 net/xfrm/xfrm_hash.c:21 xfrm_policy_init net/xfrm/xfrm_policy.c:4036 [inline] xfrm_net_init+0x269/0xd60 net/xfrm/xfrm_policy.c:4120 ops_init+0x336/0x420 net/core/net_namespace.c:130 setup_net+0x212/0x690 net/core/net_namespace.c:316 The faulting address is the address of the old chain head, free'd by xfrm_hash_resize(). In xfrm_hash_rehash(), chain heads get re-initialized without any hlist_del_rcu: for (i = hmask; i >= 0; i--) INIT_HLIST_HEAD(odst + i); Then, hlist_del_rcu() gets called on the about to-be-reinserted policy when iterating the per-net list of policies. hlist_del_rcu() will then make chain->first be nonzero again: static inline void __hlist_del(struct hlist_node *n) { struct hlist_node *next = n->next; // address of next element in list struct hlist_node **pprev = n->pprev;// location of previous elem, this // can point at chain->first WRITE_ONCE(*pprev, next); // chain->first points to next elem if (next) next->pprev = pprev; Then, when we walk chainlist to find insertion point, we may find a non-empty list even though we're supposedly reinserting the first policy to an empty chain. To fix this first unlink all exact and inexact policies instead of zeroing the list heads. Add the commands equivalent to the syzbot reproducer to xfrm_policy.sh, without fix KASAN catches the corruption as it happens, SLUB poisoning detects it a bit later. Reported-by: syzbot+0165480d4ef07360eeda@syzkaller.appspotmail.com Fixes: 1548bc4e0512 ("xfrm: policy: delete inexact policies from inexact list on hash rebuild") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: remove a duplicated assignmentCong Wang2019-07-021-3/+0
| | | | | | | | | | | | | | | | Fixes: 30846090a746 ("xfrm: policy: add sequence count to sync with hash resize") Cc: Florian Westphal <fw@strlen.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * ipsec: select crypto ciphers for xfrm_algoArnd Bergmann2019-06-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernelci.org reports failed builds on arc because of what looks like an old missed 'select' statement: net/xfrm/xfrm_algo.o: In function `xfrm_probe_algs': xfrm_algo.c:(.text+0x1e8): undefined reference to `crypto_has_ahash' I don't see this in randconfig builds on other architectures, but it's fairly clear we want to select the hash code for it, like we do for all its other users. As Herbert points out, CRYPTO_BLKCIPHER is also required even though it has not popped up in build tests. Fixes: 17bc19702221 ("ipsec: Use skcipher and ahash when probing algorithms") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: fix sa selector validationNicolas Dichtel2019-06-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit b38ff4075a80, the following command does not work anymore: $ ip xfrm state add src 10.125.0.2 dst 10.125.0.1 proto esp spi 34 reqid 1 \ mode tunnel enc 'cbc(aes)' 0xb0abdba8b782ad9d364ec81e3a7d82a1 auth-trunc \ 'hmac(sha1)' 0xe26609ebd00acb6a4d51fca13e49ea78a72c73e6 96 flag align4 In fact, the selector is not mandatory, allow the user to provide an empty selector. Fixes: b38ff4075a80 ("xfrm: Fix xfrm sel prefix length validation") CC: Anirudh Gupta <anirudh.gupta@sophos.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * af_key: fix leaks in key_pol_get_resp and dump_sp.Jeremy Sowden2019-05-281-2/+6
| | | | | | | | | | | | | | | | | | | | In both functions, if pfkey_xfrm_policy2msg failed we leaked the newly allocated sk_buff. Free it on error. Fixes: 55569ce256ce ("Fix conversion between IPSEC_MODE_xxx and XFRM_MODE_xxx.") Reported-by: syzbot+4f0529365f7f2208d9f0@syzkaller.appspotmail.com Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Fix xfrm sel prefix length validationAnirudh Gupta2019-05-281-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Family of src/dst can be different from family of selector src/dst. Use xfrm selector family to validate address prefix length, while verifying new sa from userspace. Validated patch with this command: ip xfrm state add src 1.1.6.1 dst 1.1.6.2 proto esp spi 4260196 \ reqid 20004 mode tunnel aead "rfc4106(gcm(aes))" \ 0x1111016400000000000000000000000044440001 128 \ sel src 1011:1:4::2/128 sel dst 1021:1:4::2/128 dev Port5 Fixes: 07bf7908950a ("xfrm: Validate address prefix lengths in the xfrm selector.") Signed-off-by: Anirudh Gupta <anirudh.gupta@sophos.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-07-035-23/+90
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2019-07-03 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix the interpreter to properly handle BPF_ALU32 | BPF_ARSH on BE architectures, from Jiong. 2) Fix several bugs in the x32 BPF JIT for handling shifts by 0, from Luke and Xi. 3) Fix NULL pointer deref in btf_type_is_resolve_source_only(), from Stanislav. 4) Properly handle the check that forwarding is enabled on the device in bpf_ipv6_fib_lookup() helper code, from Anton. 5) Fix UAPI bpf_prog_info fields alignment for archs that have 16 bit alignment such as m68k, from Baruch. 6) Fix kernel hanging in unregister_netdevice loop while unregistering device bound to XDP socket, from Ilya. 7) Properly terminate tail update in xskq_produce_flush_desc(), from Nathan. 8) Fix broken always_inline handling in test_lwt_seg6local, from Jiri. 9) Fix bpftool to use correct argument in cgroup errors, from Jakub. 10) Fix detaching dummy prog in XDP redirect sample code, from Prashant. 11) Add Jonathan to AF_XDP reviewers, from Björn. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | xdp: fix hang while unregistering device bound to xdp socketIlya Maximets2019-07-033-16/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device that bound to XDP socket will not have zero refcount until the userspace application will not close it. This leads to hang inside 'netdev_wait_allrefs()' if device unregistering requested: # ip link del p1 < hang on recvmsg on netlink socket > # ps -x | grep ip 5126 pts/0 D+ 0:00 ip link del p1 # journalctl -b Jun 05 07:19:16 kernel: unregister_netdevice: waiting for p1 to become free. Usage count = 1 Jun 05 07:19:27 kernel: unregister_netdevice: waiting for p1 to become free. Usage count = 1 ... Fix that by implementing NETDEV_UNREGISTER event notification handler to properly clean up all the resources and unref device. This should also allow socket killing via ss(8) utility. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | xdp: hold device for umem regardless of zero-copy modeIlya Maximets2019-07-031-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device pointer stored in umem regardless of zero-copy mode, so we heed to hold the device in all cases. Fixes: c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | xsk: Properly terminate assignment in xskq_produce_flush_descNathan Chancellor2019-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang warns: In file included from net/xdp/xsk_queue.c:10: net/xdp/xsk_queue.h:292:2: warning: expression result unused [-Wunused-value] WRITE_ONCE(q->ring->producer, q->prod_tail); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:284:6: note: expanded from macro 'WRITE_ONCE' __u.__val; \ ~~~ ^~~~~ 1 warning generated. The q->prod_tail assignment has a comma at the end, not a semi-colon. Fix that so clang no longer warns and everything works as expected. Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Link: https://github.com/ClangBuiltLinux/linux/issues/544 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: fix the check that forwarding is enabled in bpf_ipv6_fib_lookupAnton Protopopov2019-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bpf_ipv6_fib_lookup function should return BPF_FIB_LKUP_RET_FWD_DISABLED when forwarding is disabled for the input device. However instead of checking if forwarding is enabled on the input device, it checked the global net->ipv6.devconf_all->forwarding flag. Change it to behave as expected. Fixes: 87f5fc7e48dd ("bpf: Provide helper to do forwarding lookups in kernel FIB table") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | sctp: count data bundling sack chunk for outctrlchunksXin Long2019-07-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now all ctrl chunks are counted for asoc stats.octrlchunks and net SCTP_MIB_OUTCTRLCHUNKS either after queuing up or bundling, other than the chunk maked and bundled in sctp_packet_bundle_sack, which caused 'outctrlchunks' not consistent with 'inctrlchunks' in peer. This issue exists since very beginning, here to fix it by increasing both net SCTP_MIB_OUTCTRLCHUNKS and asoc stats.octrlchunks when sack chunk is maked and bundled in sctp_packet_bundle_sack. Reported-by: Ja Ram Jeon <jajeon@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: don't warn in inet diag when IPV6 is disabledStephen Hemminger2019-07-031-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If IPV6 was disabled, then ss command would cause a kernel warning because the command was attempting to dump IPV6 socket information. The fix is to just remove the warning. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202249 Fixes: 432490f9d455 ("net: ip, diag -- Add diag interface for raw sockets") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | rxrpc: Fix send on a connected, but unbound socketDavid Howells2019-07-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If sendmsg() or sendmmsg() is called on a connected socket that hasn't had bind() called on it, then an oops will occur when the kernel tries to connect the call because no local endpoint has been allocated. Fix this by implicitly binding the socket if it is in the RXRPC_CLIENT_UNBOUND state, just like it does for the RXRPC_UNBOUND state. Further, the state should be transitioned to RXRPC_CLIENT_BOUND after this to prevent further attempts to bind it. This can be tested with: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/socket.h> #include <arpa/inet.h> #include <linux/rxrpc.h> static const unsigned char inet6_addr[16] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0xac, 0x14, 0x14, 0xaa }; int main(void) { struct sockaddr_rxrpc srx; struct cmsghdr *cm; struct msghdr msg; unsigned char control[16]; int fd; memset(&srx, 0, sizeof(srx)); srx.srx_family = 0x21; srx.srx_service = 0; srx.transport_type = AF_INET; srx.transport_len = 0x1c; srx.transport.sin6.sin6_family = AF_INET6; srx.transport.sin6.sin6_port = htons(0x4e22); srx.transport.sin6.sin6_flowinfo = htons(0x4e22); srx.transport.sin6.sin6_scope_id = htons(0xaa3b); memcpy(&srx.transport.sin6.sin6_addr, inet6_addr, 16); cm = (struct cmsghdr *)control; cm->cmsg_len = CMSG_LEN(sizeof(unsigned long)); cm->cmsg_level = SOL_RXRPC; cm->cmsg_type = RXRPC_USER_CALL_ID; *(unsigned long *)CMSG_DATA(cm) = 0; msg.msg_name = NULL; msg.msg_namelen = 0; msg.msg_iov = NULL; msg.msg_iovlen = 0; msg.msg_control = control; msg.msg_controllen = cm->cmsg_len; msg.msg_flags = 0; fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET); connect(fd, (struct sockaddr *)&srx, sizeof(srx)); sendmsg(fd, &msg, 0); return 0; } Leading to the following oops: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page ... RIP: 0010:rxrpc_connect_call+0x42/0xa01 ... Call Trace: ? mark_held_locks+0x47/0x59 ? __local_bh_enable_ip+0xb6/0xba rxrpc_new_client_call+0x3b1/0x762 ? rxrpc_do_sendmsg+0x3c0/0x92e rxrpc_do_sendmsg+0x3c0/0x92e rxrpc_sendmsg+0x16b/0x1b5 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1a4/0x22a ? release_sock+0x19/0x9e ? reacquire_held_locks+0x136/0x160 ? release_sock+0x19/0x9e ? find_held_lock+0x2b/0x6e ? __lock_acquire+0x268/0xf73 ? rxrpc_connect+0xdd/0xe4 ? __local_bh_enable_ip+0xb6/0xba __sys_sendmsg+0x5e/0x94 do_syscall_64+0x7d/0x1bf entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 2341e0775747 ("rxrpc: Simplify connect() implementation and simplify sendmsg() op") Reported-by: syzbot+7966f2a0b2c7da8939b4@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: bridge: stp: don't cache eth dest pointer before skb pullNikolay Aleksandrov2019-07-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Don't cache eth dest pointer before calling pskb_may_pull. Fixes: cf0f02d04a83 ("[BRIDGE]: use llc for receiving STP packets") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: bridge: don't cache ether dest pointer on inputNikolay Aleksandrov2019-07-021-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would cache ether dst pointer on input in br_handle_frame_finish but after the neigh suppress code that could lead to a stale pointer since both ipv4 and ipv6 suppress code do pskb_may_pull. This means we have to always reload it after the suppress code so there's no point in having it cached just retrieve it directly. Fixes: 057658cb33fbf ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Fixes: ed842faeb2bd ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 queryNikolay Aleksandrov2019-07-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We get a pointer to the ipv6 hdr in br_ip6_multicast_query but we may call pskb_may_pull afterwards and end up using a stale pointer. So use the header directly, it's just 1 place where it's needed. Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handlingNikolay Aleksandrov2019-07-021-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We take a pointer to grec prior to calling pskb_may_pull and use it afterwards to get nsrcs so record nsrcs before the pull when handling igmp3 and we get a pointer to nsrcs and call pskb_may_pull when handling mld2 which again could lead to reading 2 bytes out-of-bounds. ================================================================== BUG: KASAN: use-after-free in br_multicast_rcv+0x480c/0x4ad0 [bridge] Read of size 2 at addr ffff8880421302b4 by task ksoftirqd/1/16 CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G OE 5.2.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x71/0xab print_address_description+0x6a/0x280 ? br_multicast_rcv+0x480c/0x4ad0 [bridge] __kasan_report+0x152/0x1aa ? br_multicast_rcv+0x480c/0x4ad0 [bridge] ? br_multicast_rcv+0x480c/0x4ad0 [bridge] kasan_report+0xe/0x20 br_multicast_rcv+0x480c/0x4ad0 [bridge] ? br_multicast_disable_port+0x150/0x150 [bridge] ? ktime_get_with_offset+0xb4/0x150 ? __kasan_kmalloc.constprop.6+0xa6/0xf0 ? __netif_receive_skb+0x1b0/0x1b0 ? br_fdb_update+0x10e/0x6e0 [bridge] ? br_handle_frame_finish+0x3c6/0x11d0 [bridge] br_handle_frame_finish+0x3c6/0x11d0 [bridge] ? br_pass_frame_up+0x3a0/0x3a0 [bridge] ? virtnet_probe+0x1c80/0x1c80 [virtio_net] br_handle_frame+0x731/0xd90 [bridge] ? select_idle_sibling+0x25/0x7d0 ? br_handle_frame_finish+0x11d0/0x11d0 [bridge] __netif_receive_skb_core+0xced/0x2d70 ? virtqueue_get_buf_ctx+0x230/0x1130 [virtio_ring] ? do_xdp_generic+0x20/0x20 ? virtqueue_napi_complete+0x39/0x70 [virtio_net] ? virtnet_poll+0x94d/0xc78 [virtio_net] ? receive_buf+0x5120/0x5120 [virtio_net] ? __netif_receive_skb_one_core+0x97/0x1d0 __netif_receive_skb_one_core+0x97/0x1d0 ? __netif_receive_skb_core+0x2d70/0x2d70 ? _raw_write_trylock+0x100/0x100 ? __queue_work+0x41e/0xbe0 process_backlog+0x19c/0x650 ? _raw_read_lock_irq+0x40/0x40 net_rx_action+0x71e/0xbc0 ? __switch_to_asm+0x40/0x70 ? napi_complete_done+0x360/0x360 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __schedule+0x85e/0x14d0 __do_softirq+0x1db/0x5f9 ? takeover_tasklets+0x5f0/0x5f0 run_ksoftirqd+0x26/0x40 smpboot_thread_fn+0x443/0x680 ? sort_range+0x20/0x20 ? schedule+0x94/0x210 ? __kthread_parkme+0x78/0xf0 ? sort_range+0x20/0x20 kthread+0x2ae/0x3a0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x35/0x40 The buggy address belongs to the page: page:ffffea0001084c00 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 flags: 0xffffc000000000() raw: 00ffffc000000000 ffffea0000cfca08 ffffea0001098608 0000000000000000 raw: 0000000000000000 0000000000000003 00000000ffffff7f 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888042130180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888042130200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ffff888042130280: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888042130300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888042130380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint Fixes: bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") Reported-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/tls: make sure offload also gets the keys wipedJakub Kicinski2019-07-012-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 86029d10af18 ("tls: zero the crypto information from tls_context before freeing") added memzero_explicit() calls to clear the key material before freeing struct tls_context, but it missed tls_device.c has its own way of freeing this structure. Replace the missing free. Fixes: 86029d10af18 ("tls: zero the crypto information from tls_context before freeing") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/tls: reject offload of TLS 1.3Jakub Kicinski2019-07-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neither drivers nor the tls offload code currently supports TLS version 1.3. Check the TLS version when installing connection state. TLS 1.3 will just fallback to the kernel crypto for now. Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | idr: introduce idr_for_each_entry_continue_ul()Cong Wang2019-07-011-20/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarly, other callers of idr_get_next_ul() suffer the same overflow bug as they don't handle it properly either. Introduce idr_for_each_entry_continue_ul() to help these callers iterate from a given ID. cls_flower needs more care here because it still has overflow when does arg->cookie++, we have to fold its nested loops into one and remove the arg->cookie++. Fixes: 01683a146999 ("net: sched: refactor flower walk to iterate over idr") Fixes: 12d6066c3b29 ("net/mlx5: Add flow counters idr") Reported-by: Li Shuang <shuali@redhat.com> Cc: Davide Caratti <dcaratti@redhat.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Chris Mi <chrism@mellanox.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | idr: fix overflow case for idr_for_each_entry_ul()Cong Wang2019-07-011-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | idr_for_each_entry_ul() is buggy as it can't handle overflow case correctly. When we have an ID == UINT_MAX, it becomes an infinite loop. This happens when running on 32-bit CPU where unsigned long has the same size with unsigned int. There is no better way to fix this than casting it to a larger integer, but we can't just 64 bit integer on 32 bit CPU. Instead we could just use an additional integer to help us to detect this overflow case, that is, adding a new parameter to this macro. Fortunately tc action is its only user right now. Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Reported-by: Li Shuang <shuali@redhat.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chris Mi <chrism@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | vsock/virtio: fix flush of works during the .remove()Stefano Garzarella2019-07-011-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the flush of works after vdev->config->del_vqs(vdev), because we need to be sure that no workers run before to free the 'vsock' object. Since we stopped the workers using the [tx|rx|event]_run flags, we are sure no one is accessing the device while we are calling vdev->config->reset(vdev), so we can safely move the workers' flush. Before the vdev->config->del_vqs(vdev), workers can be scheduled by VQ callbacks, so we must flush them after del_vqs(), to avoid use-after-free of 'vsock' object. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | vsock/virtio: stop workers during the .remove()Stefano Garzarella2019-07-011-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before to call vdev->config->reset(vdev) we need to be sure that no one is accessing the device, for this reason, we add new variables in the struct virtio_vsock to stop the workers during the .remove(). This patch also add few comments before vdev->config->reset(vdev) and vdev->config->del_vqs(vdev). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsockStefano Garzarella2019-07-011-23/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some callbacks used by the upper layers can run while we are in the .remove(). A potential use-after-free can happen, because we free the_virtio_vsock without knowing if the callbacks are over or not. To solve this issue we move the assignment of the_virtio_vsock at the end of .probe(), when we finished all the initialization, and at the beginning of .remove(), before to release resources. For the same reason, we do the same also for the vdev->priv. We use RCU to be sure that all callbacks that use the_virtio_vsock ended before freeing it. This is not required for callbacks that use vdev->priv, because after the vdev->config->del_vqs() we are sure that they are ended and will no longer be invoked. We also take the mutex during the .remove() to avoid that .probe() can run while we are resetting the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sctp: fix error handling on stream scheduler initializationMarcelo Ricardo Leitner2019-07-011-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It allocates the extended area for outbound streams only on sendmsg calls, if they are not yet allocated. When using the priority stream scheduler, this initialization may imply into a subsequent allocation, which may fail. In this case, it was aborting the stream scheduler initialization but leaving the ->ext pointer (allocated) in there, thus in a partially initialized state. On a subsequent call to sendmsg, it would notice the ->ext pointer in there, and trip on uninitialized stuff when trying to schedule the data chunk. The fix is undo the ->ext initialization if the stream scheduler initialization fails and avoid the partially initialized state. Although syzkaller bisected this to commit 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc"), this bug was actually introduced on the commit I marked below. Reported-by: syzbot+c1a380d42b190ad1e559@syzkaller.appspotmail.com Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netrom: fix a memory leak in nr_rx_frame()Cong Wang2019-07-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the skb is associated with a new sock, just assigning it to skb->sk is not sufficient, we have to set its destructor to free the sock properly too. Reported-by: syzbot+d6636a36d3c34bd88938@syzkaller.appspotmail.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: don't set IPv6 only flags to IPv4 addressesMatteo Croce2019-07-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid the situation where an IPV6 only flag is applied to an IPv4 address: # ip addr add 192.0.2.1/24 dev dummy0 nodad home mngtmpaddr noprefixroute # ip -4 addr show dev dummy0 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global noprefixroute dummy0 valid_lft forever preferred_lft forever Or worse, by sending a malicious netlink command: # ip -4 addr show dev dummy0 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global nodad optimistic dadfailed home tentative mngtmpaddr noprefixroute stable-privacy dummy0 valid_lft forever preferred_lft forever Signed-off-by: Matteo Croce <mcroce@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: openvswitch: fix csum updates for MPLS actionsJohn Hurley2019-06-301-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Skbs may have their checksum value populated by HW. If this is a checksum calculated over the entire packet then the CHECKSUM_COMPLETE field is marked. Changes to the data pointer on the skb throughout the network stack still try to maintain this complete csum value if it is required through functions such as skb_postpush_rcsum. The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when changes are made to packet data without a push or a pull. This occurs when the ethertype of the MAC header is changed or when MPLS lse fields are modified. The modification is carried out using the csum_partial function to get the csum of a buffer and add it into the larger checksum. The buffer is an inversion of the data to be removed followed by the new data. Because the csum is calculated over 16 bits and these values align with 16 bits, the effect is the removal of the old value from the CHECKSUM_COMPLETE and addition of the new value. However, the csum fed into the function and the outcome of the calculation are also inverted. This would only make sense if it was the new value rather than the old that was inverted in the input buffer. Fix the issue by removing the bit inverts in the csum_partial calculation. The bug was verified and the fix tested by comparing the folded value of the updated CHECKSUM_COMPLETE value with the folded value of a full software checksum calculation (reset skb->csum to 0 and run skb_checksum_complete(skb)). Prior to the fix the outcomes differed but after they produce the same result. Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Fixes: bc7cc5999fd3 ("openvswitch: update checksum in {push,pop}_mpls") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | igmp: fix memory leak in igmpv3_del_delrec()Eric Dumazet2019-06-291-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | im->tomb and/or im->sources might not be NULL, but we currently overwrite their values blindly. Using swap() will make sure the following call to kfree_pmc(pmc) will properly free the psf structures. Tested with the C repro provided by syzbot, which basically does : socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IP, IP_ADD_MEMBERSHIP, "\340\0\0\2\177\0\0\1\0\0\0\0", 12) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=0}) = 0 setsockopt(3, SOL_IP, IP_MSFILTER, "\340\0\0\2\177\0\0\1\1\0\0\0\1\0\0\0\377\377\377\377", 20) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=IFF_UP}) = 0 exit_group(0) = ? BUG: memory leak unreferenced object 0xffff88811450f140 (size 64): comm "softirq", pid 0, jiffies 4294942448 (age 32.070s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 ................ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace: [<00000000c7bad083>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000c7bad083>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000c7bad083>] slab_alloc mm/slab.c:3326 [inline] [<00000000c7bad083>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<000000009acc4151>] kmalloc include/linux/slab.h:547 [inline] [<000000009acc4151>] kzalloc include/linux/slab.h:742 [inline] [<000000009acc4151>] ip_mc_add1_src net/ipv4/igmp.c:1976 [inline] [<000000009acc4151>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2100 [<000000004ac14566>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2484 [<0000000052d8f995>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:959 [<000000004ee1e21f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1248 [<0000000066cdfe74>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2618 [<000000009383a786>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3126 [<00000000d8ac0c94>] __sys_setsockopt+0x98/0x120 net/socket.c:2072 [<000000001b1e9666>] __do_sys_setsockopt net/socket.c:2083 [inline] [<000000001b1e9666>] __se_sys_setsockopt net/socket.c:2080 [inline] [<000000001b1e9666>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2080 [<00000000420d395e>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<000000007fd83a4b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link down") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Reported-by: syzbot+6ca1abd0db68b5173a4f@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: make skb_dst_force return true when dst is refcountedFlorian Westphal2019-06-291-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netfilter did not expect that skb_dst_force() can cause skb to lose its dst entry. I got a bug report with a skb->dst NULL dereference in netfilter output path. The backtrace contains nf_reinject(), so the dst might have been cleared when skb got queued to userspace. Other users were fixed via if (skb_dst(skb)) { skb_dst_force(skb); if (!skb_dst(skb)) goto handle_err; } But I think its preferable to make the 'dst might be cleared' part of the function explicit. In netfilter case, skb with a null dst is expected when queueing in prerouting hook, so drop skb for the other hooks. v2: v1 of this patch returned true in case skb had no dst entry. Eric said: Say if we have two skb_dst_force() calls for some reason on the same skb, only the first one will return false. This now returns false even when skb had no dst, as per Erics suggestion, so callers might need to check skb_dst() first before skb_dst_force(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sctp: not bind the socket in sctp_connectXin Long2019-06-291-21/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now when sctp_connect() is called with a wrong sa_family, it binds to a port but doesn't set bp->port, then sctp_get_af_specific will return NULL and sctp_connect() returns -EINVAL. Then if sctp_bind() is called to bind to another port, the last port it has bound will leak due to bp->port is NULL by then. sctp_connect() doesn't need to bind ports, as later __sctp_connect will do it if bp->port is NULL. So remove it from sctp_connect(). While at it, remove the unnecessary sockaddr.sa_family len check as it's already done in sctp_inet_connect. Fixes: 644fbdeacf1d ("sctp: fix the issue that flags are ignored when using kernel_connect") Reported-by: syzbot+079bf326b38072f849d9@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2019-06-287-80/+95
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter fixes for net: 1) Fix memleak reported by syzkaller when registering IPVS hooks, patch from Julian Anastasov. 2) Fix memory leak in start_sync_thread, also from Julian. 3) Fix conntrack deletion via ctnetlink, from Felix Kaechele. 4) Fix reject for ICMP due to incorrect checksum handling, from He Zhe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netfilter: Fix remainder of pseudo-header protocol 0He Zhe2019-06-283-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since v5.1-rc1, some types of packets do not get unreachable reply with the following iptables setting. Fox example, $ iptables -A INPUT -p icmp --icmp-type 8 -j REJECT $ ping 127.0.0.1 -c 1 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. — 127.0.0.1 ping statistics — 1 packets transmitted, 0 received, 100% packet loss, time 0ms We should have got the following reply from command line, but we did not. From 127.0.0.1 icmp_seq=1 Destination Port Unreachable Yi Zhao reported it and narrowed it down to: 7fc38225363d ("netfilter: reject: skip csum verification for protocols that don't support it"), This is because nf_ip_checksum still expects pseudo-header protocol type 0 for packets that are of neither TCP or UDP, and thus ICMP packets are mistakenly treated as TCP/UDP. This patch corrects the conditions in nf_ip_checksum and all other places that still call it with protocol 0. Fixes: 7fc38225363d ("netfilter: reject: skip csum verification for protocols that don't support it") Reported-by: Yi Zhao <yi.zhao@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: ctnetlink: Fix regression in conntrack entry deletionFelix Kaechele2019-06-261-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f8e608982022 ("netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression") introduced a regression in which deletion of conntrack entries would fail because the L3 protocol information is replaced by AF_UNSPEC. As a result the search for the entry to be deleted would turn up empty due to the tuple used to perform the search is now different from the tuple used to initially set up the entry. For flushing the conntrack table we do however want to keep the option for nfgenmsg->version to have a non-zero value to allow for newer user-space tools to request treatment under the new behavior. With that it is possible to independently flush tables for a defined L3 protocol. This was introduced with the enhancements in in commit 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter on flush"). Older user-space tools will retain the behavior of flushing all tables regardless of defined L3 protocol. Fixes: f8e608982022 ("netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression") Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Felix Kaechele <felix@kaechele.ca> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | ipvs: fix tinfo memory leak in start_sync_threadJulian Anastasov2019-06-252-66/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller reports for memory leak in start_sync_thread [1] As Eric points out, kthread may start and stop before the threadfn function is called, so there is no chance the data (tinfo in our case) to be released in thread. Fix this by releasing tinfo in the controlling code instead. [1] BUG: memory leak unreferenced object 0xffff8881206bf700 (size 32): comm "syz-executor761", pid 7268, jiffies 4294943441 (age 20.470s) hex dump (first 32 bytes): 00 40 7c 09 81 88 ff ff 80 45 b8 21 81 88 ff ff .@|......E.!.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000057619e23>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<0000000057619e23>] slab_post_alloc_hook mm/slab.h:439 [inline] [<0000000057619e23>] slab_alloc mm/slab.c:3326 [inline] [<0000000057619e23>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<0000000086ce5479>] kmalloc include/linux/slab.h:547 [inline] [<0000000086ce5479>] start_sync_thread+0x5d2/0xe10 net/netfilter/ipvs/ip_vs_sync.c:1862 [<000000001a9229cc>] do_ip_vs_set_ctl+0x4c5/0x780 net/netfilter/ipvs/ip_vs_ctl.c:2402 [<00000000ece457c8>] nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] [<00000000ece457c8>] nf_setsockopt+0x4c/0x80 net/netfilter/nf_sockopt.c:115 [<00000000942f62d4>] ip_setsockopt net/ipv4/ip_sockglue.c:1258 [inline] [<00000000942f62d4>] ip_setsockopt+0x9b/0xb0 net/ipv4/ip_sockglue.c:1238 [<00000000a56a8ffd>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616 [<00000000fa895401>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3130 [<0000000095eef4cf>] __sys_setsockopt+0x98/0x120 net/socket.c:2078 [<000000009747cf88>] __do_sys_setsockopt net/socket.c:2089 [inline] [<000000009747cf88>] __se_sys_setsockopt net/socket.c:2086 [inline] [<000000009747cf88>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086 [<00000000ded8ba80>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<00000000893b4ac8>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+7e2e50c8adfccd2e5041@syzkaller.appspotmail.com Suggested-by: Eric Biggers <ebiggers@kernel.org> Fixes: 998e7a76804b ("ipvs: Use kthread_run() instead of doing a double-fork via kernel_thread()") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | ipvs: defer hook registration to avoid leaksJulian Anastasov2019-06-211-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller reports for memory leak when registering hooks [1] As we moved the nf_unregister_net_hooks() call into __ip_vs_dev_cleanup(), defer the nf_register_net_hooks() call, so that hooks are allocated and freed from same pernet_operations (ipvs_core_dev_ops). [1] BUG: memory leak unreferenced object 0xffff88810acd8a80 (size 96): comm "syz-executor073", pid 7254, jiffies 4294950560 (age 22.250s) hex dump (first 32 bytes): 02 00 00 00 00 00 00 00 50 8b bb 82 ff ff ff ff ........P....... 00 00 00 00 00 00 00 00 00 77 bb 82 ff ff ff ff .........w...... backtrace: [<0000000013db61f1>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<0000000013db61f1>] slab_post_alloc_hook mm/slab.h:439 [inline] [<0000000013db61f1>] slab_alloc_node mm/slab.c:3269 [inline] [<0000000013db61f1>] kmem_cache_alloc_node_trace+0x15b/0x2a0 mm/slab.c:3597 [<000000001a27307d>] __do_kmalloc_node mm/slab.c:3619 [inline] [<000000001a27307d>] __kmalloc_node+0x38/0x50 mm/slab.c:3627 [<0000000025054add>] kmalloc_node include/linux/slab.h:590 [inline] [<0000000025054add>] kvmalloc_node+0x4a/0xd0 mm/util.c:431 [<0000000050d1bc00>] kvmalloc include/linux/mm.h:637 [inline] [<0000000050d1bc00>] kvzalloc include/linux/mm.h:645 [inline] [<0000000050d1bc00>] allocate_hook_entries_size+0x3b/0x60 net/netfilter/core.c:61 [<00000000e8abe142>] nf_hook_entries_grow+0xae/0x270 net/netfilter/core.c:128 [<000000004b94797c>] __nf_register_net_hook+0x9a/0x170 net/netfilter/core.c:337 [<00000000d1545cbc>] nf_register_net_hook+0x34/0xc0 net/netfilter/core.c:464 [<00000000876c9b55>] nf_register_net_hooks+0x53/0xc0 net/netfilter/core.c:480 [<000000002ea868e0>] __ip_vs_init+0xe8/0x170 net/netfilter/ipvs/ip_vs_core.c:2280 [<000000002eb2d451>] ops_init+0x4c/0x140 net/core/net_namespace.c:130 [<000000000284ec48>] setup_net+0xde/0x230 net/core/net_namespace.c:316 [<00000000a70600fa>] copy_net_ns+0xf0/0x1e0 net/core/net_namespace.c:439 [<00000000ff26c15e>] create_new_namespaces+0x141/0x2a0 kernel/nsproxy.c:107 [<00000000b103dc79>] copy_namespaces+0xa1/0xe0 kernel/nsproxy.c:165 [<000000007cc008a2>] copy_process.part.0+0x11fd/0x2150 kernel/fork.c:2035 [<00000000c344af7c>] copy_process kernel/fork.c:1800 [inline] [<00000000c344af7c>] _do_fork+0x121/0x4f0 kernel/fork.c:2369 Reported-by: syzbot+722da59ccb264bc19910@syzkaller.appspotmail.com Fixes: 719c7d563c17 ("ipvs: Fix use-after-free in ip_vs_in") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | Merge tag 'batadv-net-for-davem-20190627' of git://git.open-mesh.org/linux-mergeDavid S. Miller2019-06-284-2/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - fix a leaked TVLV handler which wasn't unregistered, by Jeremy Sowden - fix duplicated OGMs when interfaces are set UP, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | batman-adv: Fix duplicated OGMs on NETDEV_UPSven Eckelmann2019-06-023-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The state of slave interfaces are handled differently depending on whether the interface is up or not. All active interfaces (IFF_UP) will transmit OGMs. But for B.A.T.M.A.N. IV, also non-active interfaces are scheduling (low TTL) OGMs on active interfaces. The code which setups and schedules the OGMs must therefore already be called when the interfaces gets added as slave interface and the transmit function must then check whether it has to send out the OGM or not on the specific slave interface. But the commit f0d97253fb5f ("batman-adv: remove ogm_emit and ogm_schedule API calls") moved the setup code from the enable function to the activate function. The latter is called either when the added slave was already up when batadv_hardif_enable_interface processed the new interface or when a NETDEV_UP event was received for this slave interfac. As result, each NETDEV_UP would schedule a new OGM worker for the interface and thus OGMs would be send a lot more than expected. Fixes: f0d97253fb5f ("batman-adv: remove ogm_emit and ogm_schedule API calls") Reported-by: Linus Lüssing <linus.luessing@c0d3.blue> Tested-by: Linus Lüssing <linus.luessing@c0d3.blue> Acked-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| * | | | batman-adv: fix for leaked TVLV handler.Jeremy Sowden2019-05-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A handler for BATADV_TVLV_ROAM was being registered when the translation-table was initialized, but not unregistered when the translation-table was freed. Unregister it. Fixes: 122edaa05940 ("batman-adv: tvlv - convert roaming adv packet to use tvlv unicast packets") Reported-by: syzbot+d454a826e670502484b8@syzkaller.appspotmail.com Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Sven Eckelmann <sven@narfation.org Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2019-06-2816-28/+114
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix ppp_mppe crypto soft dependencies, from Takashi Iawi. 2) Fix TX completion to be finite, from Sergej Benilov. 3) Use register_pernet_device to avoid a dst leak in tipc, from Xin Long. 4) Double free of TX cleanup in Dirk van der Merwe. 5) Memory leak in packet_set_ring(), from Eric Dumazet. 6) Out of bounds read in qmi_wwan, from Bjørn Mork. 7) Fix iif used in mcast/bcast looped back packets, from Stephen Suryaputra. 8) Fix neighbour resolution on raw ipv6 sockets, from Nicolas Dichtel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (25 commits) af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET sctp: change to hold sk after auth shkey is created successfully ipv6: fix neighbour resolution with raw socket ipv6: constify rt6_nexthop() net: dsa: microchip: Use gpiod_set_value_cansleep() net: aquantia: fix vlans not working over bridged network ipv4: reset rt_iif for recirculated mcast/bcast out pkts team: Always enable vlan tx offload net/smc: Fix error path in smc_init net/smc: hold conns_lock before calling smc_lgr_register_conn() bonding: Always enable vlan tx offload net/ipv6: Fix misuse of proc_dointvec "skip_notify_on_dev_down" ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop qmi_wwan: Fix out-of-bounds read tipc: check msg->req data len in tipc_nl_compat_bearer_disable net: macb: do not copy the mac address if NULL net/packet: fix memory leak in packet_set_ring() net/tls: fix page double free on TX cleanup net/sched: cbs: Fix error path of cbs_module_init tipc: change to use register_pernet_device ...
| * | | | | af_packet: Block execution of tasks waiting for transmit to complete in ↵Neil Horman2019-06-262-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AF_PACKET When an application is run that: a) Sets its scheduler to be SCHED_FIFO and b) Opens a memory mapped AF_PACKET socket, and sends frames with the MSG_DONTWAIT flag cleared, its possible for the application to hang forever in the kernel. This occurs because when waiting, the code in tpacket_snd calls schedule, which under normal circumstances allows other tasks to run, including ksoftirqd, which in some cases is responsible for freeing the transmitted skb (which in AF_PACKET calls a destructor that flips the status bit of the transmitted frame back to available, allowing the transmitting task to complete). However, when the calling application is SCHED_FIFO, its priority is such that the schedule call immediately places the task back on the cpu, preventing ksoftirqd from freeing the skb, which in turn prevents the transmitting task from detecting that the transmission is complete. We can fix this by converting the schedule call to a completion mechanism. By using a completion queue, we force the calling task, when it detects there are no more frames to send, to schedule itself off the cpu until such time as the last transmitted skb is freed, allowing forward progress to be made. Tested by myself and the reporter, with good results Change Notes: V1->V2: Enhance the sleep logic to support being interruptible and allowing for honoring to SK_SNDTIMEO (Willem de Bruijn) V2->V3: Rearrage the point at which we wait for the completion queue, to avoid needing to check for ph/skb being null at the end of the loop. Also move the complete call to the skb destructor to avoid needing to modify __packet_set_status. Also gate calling complete on packet_read_pending returning zero to avoid multiple calls to complete. (Willem de Bruijn) Move timeo computation within loop, to re-fetch the socket timeout since we also use the timeo variable to record the return code from the wait_for_complete call (Neil Horman) V3->V4: Willem has requested that the control flow be restored to the previous state. Doing so lets us eliminate the need for the po->wait_on_complete flag variable, and lets us get rid of the packet_next_frame function, but introduces another complexity. Specifically, but using the packet pending count, we can, if an applications calls sendmsg multiple times with MSG_DONTWAIT set, each set of transmitted frames, when complete, will cause tpacket_destruct_skb to issue a complete call, for which there will never be a wait_on_completion call. This imbalance will lead to any future call to wait_for_completion here to return early, when the frames they sent may not have completed. To correct this, we need to re-init the completion queue on every call to tpacket_snd before we enter the loop so as to ensure we wait properly for the frames we send in this iteration. Change the timeout and interrupted gotos to out_put rather than out_status so that we don't try to free a non-existant skb Clean up some extra newlines (Willem de Bruijn) Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | sctp: change to hold sk after auth shkey is created successfullyXin Long2019-06-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now in sctp_endpoint_init(), it holds the sk then creates auth shkey. But when the creation fails, it doesn't release the sk, which causes a sk defcnf leak, Here to fix it by only holding the sk when auth shkey is created successfully. Fixes: a29a5bd4f5c3 ("[SCTP]: Implement SCTP-AUTH initializations.") Reported-by: syzbot+afabda3890cc2f765041@syzkaller.appspotmail.com Reported-by: syzbot+276ca1c77a19977c0130@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ipv6: fix neighbour resolution with raw socketNicolas Dichtel2019-06-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scenario is the following: the user uses a raw socket to send an ipv6 packet, destinated to a not-connected network, and specify a connected nh. Here is the corresponding python script to reproduce this scenario: import socket IPPROTO_RAW = 255 send_s = socket.socket(socket.AF_INET6, socket.SOCK_RAW, IPPROTO_RAW) # scapy # p = IPv6(src='fd00:100::1', dst='fd00:200::fa')/ICMPv6EchoRequest() # str(p) req = b'`\x00\x00\x00\x00\x08:@\xfd\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xfd\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xfa\x80\x00\x81\xc0\x00\x00\x00\x00' send_s.sendto(req, ('fd00:175::2', 0, 0, 0)) fd00:175::/64 is a connected route and fd00:200::fa is not a connected host. With this scenario, the kernel starts by sending a NS to resolve fd00:175::2. When it receives the NA, it flushes its queue and try to send the initial packet. But instead of sending it, it sends another NS to resolve fd00:200::fa, which obvioulsy fails, thus the packet is dropped. If the user sends again the packet, it now uses the right nh (fd00:175::2). The problem is that ip6_dst_lookup_neigh() uses the rt6i_gateway, which is :: because the associated route is a connected route, thus it uses the dst addr of the packet. Let's use rt6_nexthop() to choose the right nh. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ipv6: constify rt6_nexthop()Nicolas Dichtel2019-06-263-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no functional change in this patch, it only prepares the next one. rt6_nexthop() will be used by ip6_dst_lookup_neigh(), which uses const variables. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ipv4: reset rt_iif for recirculated mcast/bcast out pktsStephen Suryaputra2019-06-262-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multicast or broadcast egress packets have rt_iif set to the oif. These packets might be recirculated back as input and lookup to the raw sockets may fail because they are bound to the incoming interface (skb_iif). If rt_iif is not zero, during the lookup, inet_iif() function returns rt_iif instead of skb_iif. Hence, the lookup fails. v2: Make it non vrf specific (David Ahern). Reword the changelog to reflect it. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net/smc: Fix error path in smc_initYueHaibing2019-06-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If register_pernet_subsys success in smc_init, we should cleanup it in case any other error. Fixes: 64e28b52c7a6 (net/smc: add pnet table namespace support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net/smc: hold conns_lock before calling smc_lgr_register_conn()Huaping Zhou2019-06-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After smc_lgr_create(), the newly created link group is added to smc_lgr_list, thus is accessible from other context. Although link group creation is serialized by smc_create_lgr_pending, the new link group may still be accessed concurrently. For example, if ib_device is no longer active, smc_ib_port_event_work() will call smc_port_terminate(), which in turn will call __smc_lgr_terminate() on every link group of this device. So conns_lock is required here. Signed-off-by: Huaping Zhou <zhp@smail.nju.edu.cn> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>