summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* net: rds: add service level support in rds-infoZhu Yanjun2019-08-244-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | >From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL) is used to identify different flows within an IBA subnet. It is carried in the local route header of the packet. Before this commit, run "rds-info -I". The outputs are as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " After this commit, the output is as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " The commit fe3475af3bdf ("net: rds: add per rds connection cache statistics") adds cache_allocs in struct rds_info_rdma_connection as below: struct rds_info_rdma_connection { ... __u32 rdma_mr_max; __u32 rdma_mr_size; __u8 tos; __u32 cache_allocs; }; The peer struct in rds-tools of struct rds_info_rdma_connection is as below: struct rds_info_rdma_connection { ... uint32_t rdma_mr_max; uint32_t rdma_mr_size; uint8_t tos; uint8_t sl; uint32_t cache_allocs; }; The difference between userspace and kernel is the member variable sl. In the kernel struct, the member variable sl is missing. This will introduce risks. So it is necessary to use this commit to avoid this risk. Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics") CC: Joe Jin <joe.jin@oracle.com> CC: JUNXIAO_BI <junxiao.bi@oracle.com> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: route dump netlink NLM_F_MULTI flag missingJohn Fastabend2019-08-242-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An excerpt from netlink(7) man page, In multipart messages (multiple nlmsghdr headers with associated payload in one byte stream) the first and all following headers have the NLM_F_MULTI flag set, except for the last header which has the type NLMSG_DONE. but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a FIB dump. The result is user space applications following above man page excerpt may get confused and may stop parsing msg believing something went wrong. In the golang netlink lib [0] the library logic stops parsing believing the message is not a multipart message. Found this running Cilium[1] against net-next while adding a feature to auto-detect routes. I noticed with multiple route tables we no longer could detect the default routes on net tree kernels because the library logic was not returning them. Fix this by handling the fib_dump_info_fnhe() case the same way the fib_dump_info() handles it by passing the flags argument through the call chain and adding a flags argument to rt_fill_info(). Tested with Cilium stack and auto-detection of routes works again. Also annotated libs to dump netlink msgs and inspected NLM_F_MULTI and NLMSG_DONE flags look correct after this. Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same as is done for fib_dump_info() so this looks correct to me. [0] https://github.com/vishvananda/netlink/ [1] https://github.com/cilium/ Fixes: ee28906fd7a14 ("ipv4: Dump route exceptions if requested") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock: fix potential memory leak in proto_register()zhanglin2019-08-241-10/+21
| | | | | | | | | | | | If protocols registered exceeded PROTO_INUSE_NR, prot will be added to proto_list, but no available bit left for prot in proto_inuse_idx. Changes since v2: * Propagate the error code properly Signed-off-by: zhanglin <zhang.lin16@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md modeHangbin Liu2019-08-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In decode_session{4,6} there is a possibility that the skb dst dev is NULL, e,g, with tunnel collect_md mode, which will cause kernel crash. Here is what the code path looks like, for GRE: - ip6gre_tunnel_xmit - ip6gre_xmit_ipv6 - __gre6_xmit - ip6_tnl_xmit - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE - icmpv6_send - icmpv6_route_lookup - xfrm_decode_session_reverse - decode_session4 - oif = skb_dst(skb)->dev->ifindex; <-- here - decode_session6 - oif = skb_dst(skb)->dev->ifindex; <-- here The reason is __metadata_dst_init() init dst->dev to NULL by default. We could not fix it in __metadata_dst_init() as there is no dev supplied. On the other hand, the skb_dst(skb)->dev is actually not needed as we called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif; So make a dst dev check here should be clean and safe. v4: No changes. v3: No changes. v2: fix the issue in decode_session{4,6} instead of updating shared dst dev in {ip_md, ip6}_tunnel_xmit. Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4/icmp: fix rt dst dev null pointer dereferenceHangbin Liu2019-08-241-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __icmp_send() there is a possibility that the rt->dst.dev is NULL, e,g, with tunnel collect_md mode, which will cause kernel crash. Here is what the code path looks like, for GRE: - ip6gre_tunnel_xmit - ip6gre_xmit_ipv4 - __gre6_xmit - ip6_tnl_xmit - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE - icmp_send - net = dev_net(rt->dst.dev); <-- here The reason is __metadata_dst_init() init dst->dev to NULL by default. We could not fix it in __metadata_dst_init() as there is no dev supplied. On the other hand, the reason we need rt->dst.dev is to get the net. So we can just try get it from skb->dev when rt->dst.dev is NULL. v4: Julian Anastasov remind skb->dev also could be NULL. We'd better still use dst.dev and do a check to avoid crash. v3: No changes. v2: fix the issue in __icmp_send() instead of updating shared dst dev in {ip_md, ip6}_tunnel_xmit. Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* openvswitch: Fix log message in ovs conntrackYi-Hung Wei2019-08-241-1/+1
| | | | | | Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'ieee802154-for-davem-2019-08-24' of ↵David S. Miller2019-08-241-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2019-08-24 An update from ieee802154 for your *net* tree. Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for ieee802154 and Colin Ian King cleaned up a redundant variable assignment. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ieee802154: remove redundant assignment to rcColin Ian King2019-08-141-1/+1
| | | | | | | | | | | | | | | | | | Variable rc is initialized to a value that is never read and it is re-assigned later. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-08-233-6/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2019-08-24 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei. 2) Fix a use-after-free in prog symbol exposure, from Daniel. 3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya. 4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan. 5) Fix a potential use-after-free on flow dissector detach, from Jakub. 6) Fix bpftool to close prog fd after showing metadata, from Quentin. 7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0Ilya Leoshkevich2019-08-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test_select_reuseport fails on s390 due to verifier rejecting test_select_reuseport_kern.o with the following message: ; data_check.eth_protocol = reuse_md->eth_protocol; 18: (69) r1 = *(u16 *)(r6 +22) invalid bpf_context access off=22 size=2 This is because on big-endian machines casts from __u32 to __u16 are generated by referencing the respective variable as __u16 with an offset of 2 (as opposed to 0 on little-endian machines). The verifier already has all the infrastructure in place to allow such accesses, it's just that they are not explicitly enabled for eth_protocol field. Enable them for eth_protocol field by using bpf_ctx_range instead of offsetof. Ditto for ip_protocol, bind_inany and len, since they already allow narrowing, and the same problem can arise when working with them. Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | flow_dissector: Fix potential use-after-free on BPF_PROG_DETACHJakub Sitnicki2019-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to free the program once a grace period has elapsed. The callback can run together with new RCU readers that started after the last grace period. New RCU readers can potentially see the "old" to-be-freed or already-freed pointer to the program object before the RCU update-side NULLs it. Reorder the operations so that the RCU update-side resets the protected pointer before the end of the grace period after which the program will be freed. Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Reported-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | xdp: unpin xdp umem pages in error pathIvan Khoronzhuk2019-08-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix mem leak caused by missed unpin routine for umem pages. Fixes: 8aef7340ae9695 ("xsk: introduce xdp_umem_page") Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idevSabrina Dubroca2019-08-231-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails, ignoring the specific error value. This results in addrconf_add_dev returning ENOBUFS in all cases, which is unfortunate in cases such as: # ip link add dummyX type dummy # ip link set dummyX mtu 1200 up # ip addr add 2000::/64 dev dummyX RTNETLINK answers: No buffer space available Commit a317a2f19da7 ("ipv6: fail early when creating netdev named all or default") introduced error returns in ipv6_add_dev. Before that, that function would simply return NULL for all failures. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/ncsi: Fix the payload copying for the request coming from NetlinkJustin.Lee1@Dell.com2019-08-221-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The request coming from Netlink should use the OEM generic handler. The standard command handler expects payload in bytes/words/dwords but the actual payload is stored in data if the request is coming from Netlink. Signed-off-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Vijay Khemka <vijaykhemka@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: fix icmp_socket_deliver argument 2 inputLi RongQing2019-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | it expects a unsigned int, but got a __be32 Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is setHangbin Liu2019-08-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN to make user able to add multicast address on ethernet interface. This works for IPv4, but not for IPv6. See the inet6_addr_add code. static int inet6_addr_add() { ... if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...) } ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr if (!IS_ERR(ifp)) { ... } else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...) } } But in ipv6_add_addr() it will check the address type and reject multicast address directly. So this feature is never worked for IPv6. We should not remove the multicast address check totally in ipv6_add_addr(), but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied. v2: update commit description Fixes: 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge tag 'batadv-net-for-davem-20190821' of git://git.open-mesh.org/linux-mergeDavid S. Miller2019-08-211-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | batman-adv: fix uninit-value in batadv_netlink_get_ifindex()Eric Dumazet2019-08-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | batadv_netlink_get_ifindex() needs to make sure user passed a correct u32 attribute. syzbot reported : BUG: KMSAN: uninit-value in batadv_netlink_dump_hardif+0x70d/0x880 net/batman-adv/netlink.c:968 CPU: 1 PID: 11705 Comm: syz-executor888 Not tainted 5.1.0+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x130/0x2a0 mm/kmsan/kmsan.c:622 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:310 batadv_netlink_dump_hardif+0x70d/0x880 net/batman-adv/netlink.c:968 genl_lock_dumpit+0xc6/0x130 net/netlink/genetlink.c:482 netlink_dump+0xa84/0x1ab0 net/netlink/af_netlink.c:2253 __netlink_dump_start+0xa3a/0xb30 net/netlink/af_netlink.c:2361 genl_family_rcv_msg net/netlink/genetlink.c:550 [inline] genl_rcv_msg+0xfc1/0x1a40 net/netlink/genetlink.c:627 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2486 genl_rcv+0x63/0x80 net/netlink/genetlink.c:638 netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline] netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1337 netlink_sendmsg+0x127e/0x12f0 net/netlink/af_netlink.c:1926 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:661 [inline] ___sys_sendmsg+0xcc6/0x1200 net/socket.c:2260 __sys_sendmsg net/socket.c:2298 [inline] __do_sys_sendmsg net/socket.c:2307 [inline] __se_sys_sendmsg+0x305/0x460 net/socket.c:2305 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2305 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x440209 Fixes: b60620cf567b ("batman-adv: netlink: hardif query") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
* | | | Revert "cfg80211: fix processing world regdomain when non modular"Hodaszi, Robert2019-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 96cce12ff6e0 ("cfg80211: fix processing world regdomain when non modular"). Re-triggering a reg_process_hint with the last request on all events, can make the regulatory domain fail in case of multiple WiFi modules. On slower boards (espacially with mdev), enumeration of the WiFi modules can end up in an intersected regulatory domain, and user cannot set it with 'iw reg set' anymore. This is happening, because: - 1st module enumerates, queues up a regulatory request - request gets processed by __reg_process_hint_driver(): - checks if previous was set by CORE -> yes - checks if regulator domain changed -> yes, from '00' to e.g. 'US' -> sends request to the 'crda' - 2nd module enumerates, queues up a regulator request (which triggers the reg_todo() work) - reg_todo() -> reg_process_pending_hints() sees, that the last request is not processed yet, so it tries to process it again. __reg_process_hint driver() will run again, and: - checks if the last request's initiator was the core -> no, it was the driver (1st WiFi module) - checks, if the previous initiator was the driver -> yes - checks if the regulator domain changed -> yes, it was '00' (set by core, and crda call did not return yet), and should be changed to 'US' ------> __reg_process_hint_driver calls an intersect Besides, the reg_process_hint call with the last request is meaningless since the crda call has a timeout work. If that timeout expires, the first module's request will lost. Cc: stable@vger.kernel.org Fixes: 96cce12ff6e0 ("cfg80211: fix processing world regdomain when non modular") Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com> Link: https://lore.kernel.org/r/20190614131600.GA13897@a1-hr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | cfg80211: Fix Extended Key ID key install checksAlexander Wetzel2019-08-211-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix two shortcomings in the Extended Key ID API: 1) Allow the userspace to install pairwise keys using keyid 1 without NL80211_KEY_NO_TX set. This allows the userspace to install and activate pairwise keys with keyid 1 in the same way as for keyid 0, simplifying the API usage for e.g. FILS and FT key installs. 2) IEEE 802.11 - 2016 restricts Extended Key ID usage to CCMP/GCMP ciphers in IEEE 802.11 - 2016 "9.4.2.25.4 RSN capabilities". Enforce that when installing a key. Cc: stable@vger.kernel.org # 5.2 Fixes: 6cdd3979a2bd ("nl80211/cfg80211: Extended Key ID support") Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20190805123400.51567-1-alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | mac80211: fix possible sta leakJohannes Berg2019-08-211-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If TDLS station addition is rejected, the sta memory is leaked. Avoid this by moving the check before the allocation. Cc: stable@vger.kernel.org Fixes: 7ed5285396c2 ("mac80211: don't initiate TDLS connection if station is not associated to AP") Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | net: fix __ip_mc_inc_group usageLi RongQing2019-08-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in ip_mc_inc_group, memory allocation flag, not mcast mode, is expected by __ip_mc_inc_group similar issue in __ip_mc_join_group, both mcase mode and gfp_t are needed here, so use ____ip_mc_inc_group(...) Fixes: 9fb20801dab4 ("net: Fix ip_mc_{dec,inc}_group allocation context") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net/ncsi: Ensure 32-bit boundary for data cksumTerry S. Duncan2019-08-202-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NCSI spec indicates that if the data does not end on a 32 bit boundary, one to three padding bytes equal to 0x00 shall be present to align the checksum field to a 32-bit boundary. Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net/smc: make sure EPOLLOUT is raisedJason Baron2019-08-201-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE being set when -EAGAIN is returned to ensure that EPOLLOUT is raised. Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior that Eric Dumazet introduced for tcp sockets. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Ursula Braun <ubraun@linux.ibm.com> Cc: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2019-08-193-15/+35
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Remove IP MASQUERADING record in MAINTAINERS file, from Denis Efremov. 2) Counter arguments are swapped in ebtables, from Todd Seidelmann. 3) Missing netlink attribute validation in flow_offload extension. 4) Incorrect alignment in xt_nfacct that breaks 32-bits userspace / 64-bits kernels, from Juliana Rodrigueiro. 5) Missing include guard in nf_conntrack_h323_types.h, from Masahiro Yamada. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_infoJuliana Rodrigueiro2019-08-191-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running a 64-bit kernel with a 32-bit iptables binary, the size of the xt_nfacct_match_info struct diverges. kernel: sizeof(struct xt_nfacct_match_info) : 40 iptables: sizeof(struct xt_nfacct_match_info)) : 36 Trying to append nfacct related rules results in an unhelpful message. Although it is suggested to look for more information in dmesg, nothing can be found there. # iptables -A <chain> -m nfacct --nfacct-name <acct-object> iptables: Invalid argument. Run `dmesg' for more information. This patch fixes the memory misalignment by enforcing 8-byte alignment within the struct's first revision. This solution is often used in many other uapi netfilter headers. Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: nft_flow_offload: missing netlink attribute policyPablo Neira Ayuso2019-08-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The netlink attribute policy for NFTA_FLOW_TABLE_NAME is missing. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: ebtables: Fix argument order to ADD_COUNTERTodd Seidelmann2019-08-191-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ordering of arguments to the x_tables ADD_COUNTER macro appears to be wrong in ebtables (cf. ip_tables.c, ip6_tables.c, and arp_tables.c). This causes data corruption in the ebtables userspace tools because they get incorrect packet & byte counts from the kernel. Fixes: d72133e628803 ("netfilter: ebtables: use ADD_COUNTER macro") Signed-off-by: Todd Seidelmann <tseidelmann@linode.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | | tcp: make sure EPOLLOUT wont be missedEric Dumazet2019-08-191-7/+9
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure"), it is crucial we properly set SOCK_NOSPACE when needed. However, Jason patch had a bug, because the 'nonblocking' status as far as sk_stream_wait_memory() is concerned is governed by MSG_DONTWAIT flag passed at sendmsg() time : long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE cleared, if sk->sk_sndtimeo has been set to a small (but not zero) value. This patch removes the 'noblock' variable since we must always set SOCK_NOSPACE if -EAGAIN is returned. It also renames the do_nonblock label since we might reach this code path even if we were in blocking mode. Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Reported-by: Vladimir Rutsky <rutsky@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | netfilter: nf_tables: map basechain priority to hardware priorityPablo Neira Ayuso2019-08-182-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds initial support for offloading basechains using the priority range from 1 to 65535. This is restricting the netfilter priority range to 16-bit integer since this is what most drivers assume so far from tc. It should be possible to extend this range of supported priorities later on once drivers are updated to support for 32-bit integer priorities. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'for-upstream' of ↵David S. Miller2019-08-174-3/+40
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2019-08-17 Here's a set of Bluetooth fixes for the 5.3-rc series: - Multiple fixes for Qualcomm (btqca & hci_qca) drivers - Minimum encryption key size debugfs setting (this is required for Bluetooth Qualification) - Fix hidp_send_message() to have a meaningful return value ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Bluetooth: Add debug setting for changing minimum encryption key sizeMarcel Holtmann2019-08-173-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For testing and qualification purposes it is useful to allow changing the minimum encryption key size value that the host stack is going to enforce. This adds a new debugfs setting min_encrypt_key_size to achieve this functionality. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
| * | | | Bluetooth: hidp: Let hidp_send_message return number of queued bytesFabian Henneke2019-08-121-2/+7
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let hidp_send_message return the number of successfully queued bytes instead of an unconditional 0. With the return value fixed to 0, other drivers relying on hidp, such as hidraw, can not return meaningful values from their respective implementations of write(). In particular, with the current behavior, a hidraw device's write() will have different return values depending on whether the device is connected via USB or Bluetooth, which makes it harder to abstract away the transport layer. Signed-off-by: Fabian Henneke <fabian.henneke@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
* | | | tipc: fix false detection of retransmit failuresTuong Lien2019-08-162-43/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit eliminates the use of the link 'stale_limit' & 'prev_from' (besides the already removed - 'stale_cnt') variables in the detection of repeated retransmit failures as there is no proper way to initialize them to avoid a false detection, i.e. it is not really a retransmission failure but due to a garbage values in the variables. Instead, a jiffies variable will be added to individual skbs (like the way we restrict the skb retransmissions) in order to mark the first skb retransmit time. Later on, at the next retransmissions, the timestamp will be checked to see if the skb in the link transmq is "too stale", that is, the link tolerance time has passed, so that a link reset will be ordered. Note, just checking on the first skb in the queue is fine enough since it must be the oldest one. A counter is also added to keep track the actual skb retransmissions' number for later checking when the failure happens. The downside of this approach is that the skb->cb[] buffer is about to be exhausted, however it is always able to allocate another memory area and keep a reference to it when needed. Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria") Reported-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge tag 'rxrpc-fixes-20190814' of ↵David S. Miller2019-08-151-10/+11
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix local endpoint handling Here's a pair of patches that fix two issues in the handling of local endpoints (rxrpc_local structs): (1) Use list_replace_init() rather than list_replace() if we're going to unconditionally delete the replaced item later, lest the list get corrupted. (2) Don't access the rxrpc_local object after passing our ref to the workqueue, not even to illuminate tracepoints, as the work function may cause the object to be freed. We have to cache the information beforehand. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | rxrpc: Fix read-after-free in rxrpc_queue_local()David Howells2019-08-141-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rxrpc_queue_local() attempts to queue the local endpoint it is given and then, if successful, prints a trace line. The trace line includes the current usage count - but we're not allowed to look at the local endpoint at this point as we passed our ref on it to the workqueue. Fix this by reading the usage count before queuing the work item. Also fix the reading of local->debug_id for trace lines, which must be done with the same consideration as reading the usage count. Fixes: 09d2bf595db4 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting") Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | rxrpc: Fix local endpoint replacementDavid Howells2019-08-141-1/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a local endpoint (struct rxrpc_local) ceases to be in use by any AF_RXRPC sockets, it starts the process of being destroyed, but this doesn't cause it to be removed from the namespace endpoint list immediately as tearing it down isn't trivial and can't be done in softirq context, so it gets deferred. If a new socket comes along that wants to bind to the same endpoint, a new rxrpc_local object will be allocated and rxrpc_lookup_local() will use list_replace() to substitute the new one for the old. Then, when the dying object gets to rxrpc_local_destroyer(), it is removed unconditionally from whatever list it is on by calling list_del_init(). However, list_replace() doesn't reset the pointers in the replaced list_head and so the list_del_init() will likely corrupt the local endpoints list. Fix this by using list_replace_init() instead. Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting") Reported-by: syzbot+193e29e9387ea5837f1d@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2019-08-155-28/+98
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net This patchset contains Netfilter fixes for net: 1) Extend selftest to cover flowtable with ipsec, from Florian Westphal. 2) Fix interaction of ipsec with flowtable, also from Florian. 3) User-after-free with bound set to rule that fails to load. 4) Adjust state and timeout for flows that expire. 5) Timeout update race with flows in teardown state. 6) Ensure conntrack id hash calculation use invariants as input, from Dirk Morris. 7) Do not push flows into flowtable for TCP fin/rst packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | netfilter: nft_flow_offload: skip tcp rst and fin packetsPablo Neira Ayuso2019-08-141-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP rst and fin packets do not qualify to place a flow into the flowtable. Most likely there will be no more packets after connection closure. Without this patch, this flow entry expires and connection tracking picks up the entry in ESTABLISHED state using the fixup timeout, which makes this look inconsistent to the user for a connection that is actually already closed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: conntrack: Use consistent ct id hash calculationDirk Morris2019-08-131-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change ct id hash calculation to only use invariants. Currently the ct id hash calculation is based on some fields that can change in the lifetime on a conntrack entry in some corner cases. The current hash uses the whole tuple which contains an hlist pointer which will change when the conntrack is placed on the dying list resulting in a ct id change. This patch also removes the reply-side tuple and extension pointer from the hash calculation so that the ct id will will not change from initialization until confirmation. Fixes: 3c79107631db1f7 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id") Signed-off-by: Dirk Morris <dmorris@metaloft.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: nf_flow_table: teardown flow timeout racePablo Neira Ayuso2019-08-091-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flows that are in teardown state (due to RST / FIN TCP packet) still have their offload flag set on. Hence, the conntrack garbage collector may race to undo the timeout adjustment that the fixup routine performs, leaving the conntrack entry in place with the internal offload timeout (one day). Update teardown flow state to ESTABLISHED and set tracking to liberal, then once the offload bit is cleared, adjust timeout if it is more than the default fixup timeout (conntrack might already have set a lower timeout from the packet path). Fixes: da5984e51063 ("netfilter: nf_flow_table: add support for sending flows back to the slow path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: nf_flow_table: conntrack picks up expired flowsPablo Neira Ayuso2019-08-091-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update conntrack entry to pick up expired flows, otherwise the conntrack entry gets stuck with the internal offload timeout (one day). The TCP state also needs to be adjusted to ESTABLISHED state and tracking is set to liberal mode in order to give conntrack a chance to pick up the expired flow. Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: nf_tables: use-after-free in failing rule with bound setPablo Neira Ayuso2019-08-091-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a rule that has already a bound anonymous set fails to be added, the preparation phase releases the rule and the bound set. However, the transaction object from the abort path still has a reference to the set object that is stale, leading to a use-after-free when checking for the set->bound field. Add a new field to the transaction that specifies if the set is bound, so the abort path can skip releasing it since the rule command owns it and it takes care of releasing it. After this update, the set->bound field is removed. [ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434 [ 24.657858] Mem abort info: [ 24.660686] ESR = 0x96000004 [ 24.663769] Exception class = DABT (current EL), IL = 32 bits [ 24.669725] SET = 0, FnV = 0 [ 24.672804] EA = 0, S1PTW = 0 [ 24.675975] Data abort info: [ 24.678880] ISV = 0, ISS = 0x00000004 [ 24.682743] CM = 0, WnR = 0 [ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000 [ 24.692207] [0000000000040434] pgd=0000000000000000 [ 24.697119] Internal error: Oops: 96000004 [#1] SMP [...] [ 24.889414] Call trace: [ 24.891870] __nf_tables_abort+0x3f0/0x7a0 [ 24.895984] nf_tables_abort+0x20/0x40 [ 24.899750] nfnetlink_rcv_batch+0x17c/0x588 [ 24.904037] nfnetlink_rcv+0x13c/0x190 [ 24.907803] netlink_unicast+0x18c/0x208 [ 24.911742] netlink_sendmsg+0x1b0/0x350 [ 24.915682] sock_sendmsg+0x4c/0x68 [ 24.919185] ___sys_sendmsg+0x288/0x2c8 [ 24.923037] __sys_sendmsg+0x7c/0xd0 [ 24.926628] __arm64_sys_sendmsg+0x2c/0x38 [ 24.930744] el0_svc_common.constprop.0+0x94/0x158 [ 24.935556] el0_svc_handler+0x34/0x90 [ 24.939322] el0_svc+0x8/0xc [ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863) [ 24.948336] ---[ end trace cebbb9dcbed3b56f ]--- Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: nf_flow_table: fix offload for flows that are subject to xfrmFlorian Westphal2019-08-051-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the previously added 'encap test' pass. Because its possible that the xfrm dst entry becomes stale while such a flow is offloaded, we need to call dst_check() -- the notifier that handles this for non-tunneled traffic isn't sufficient, because SA or or policies might have changed. If dst becomes stale the flow offload entry will be tagged for teardown and packets will be passed to 'classic' forwarding path. Removing the entry right away is problematic, as this would introduce a race condition with the gc worker. In case flow is long-lived, it could eventually be offloaded again once the gc worker removes the entry from the flow table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | | net/packet: fix race in tpacket_snd()Eric Dumazet2019-08-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | packet_sendmsg() checks tx_ring.pg_vec to decide if it must call tpacket_snd(). Problem is that the check is lockless, meaning another thread can issue a concurrent setsockopt(PACKET_TX_RING ) to flip tx_ring.pg_vec back to NULL. Given that tpacket_snd() grabs pg_vec_lock mutex, we can perform the check again to solve the race. syzbot reported : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 11429 Comm: syz-executor394 Not tainted 5.3.0-rc4+ #101 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:packet_lookup_frame+0x8d/0x270 net/packet/af_packet.c:474 Code: c1 ee 03 f7 73 0c 80 3c 0e 00 0f 85 cb 01 00 00 48 8b 0b 89 c0 4c 8d 24 c1 48 b8 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 <80> 3c 01 00 0f 85 94 01 00 00 48 8d 7b 10 4d 8b 3c 24 48 b8 00 00 RSP: 0018:ffff88809f82f7b8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8880a45c7030 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 1ffff110148b8e06 RDI: ffff8880a45c703c RBP: ffff88809f82f7e8 R08: ffff888087aea200 R09: fffffbfff134ae50 R10: fffffbfff134ae4f R11: ffffffff89a5727f R12: 0000000000000000 R13: 0000000000000001 R14: ffff8880a45c6ac0 R15: 0000000000000000 FS: 00007fa04716f700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa04716edb8 CR3: 0000000091eb4000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: packet_current_frame net/packet/af_packet.c:487 [inline] tpacket_snd net/packet/af_packet.c:2667 [inline] packet_sendmsg+0x590/0x6250 net/packet/af_packet.c:2975 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311 __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg net/socket.c:2439 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439 do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: tls, fix sk_write_space NULL write when tx disabledJohn Fastabend2019-08-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ctx->sk_write_space pointer is only set when TLS tx mode is enabled. When running without TX mode its a null pointer but we still set the sk sk_write_space pointer on close(). Fix the close path to only overwrite sk->sk_write_space when the current pointer is to the tls_write_space function indicating the tls module should clean it up properly as well. Reported-by: Hillf Danton <hdanton@sina.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: Andrey Konovalov <andreyknvl@google.com> Fixes: 57c722e932cfb ("net/tls: swap sk_write_space on close") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | sctp: fix memleak in sctp_send_reset_streamszhengbin2019-08-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the stream outq is not empty, need to kfree nstr_list. Fixes: d570a59c5b5f ("sctp: only allow the out stream reset when the stream outq is empty") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
* | | | | sctp: fix the transport error_count checkXin Long2019-08-131-1/+1
| |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the annotation says in sctp_do_8_2_transport_strike(): "If the transport error count is greater than the pf_retrans threshold, and less than pathmaxrtx ..." It should be transport->error_count checked with pathmaxrxt, instead of asoc->pf_retrans. Fixes: 5aa93bcf66f4 ("sctp: Implement quick failover draft from tsvwg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
* | | | tipc: initialise addr_trail_end when setting node addressesChris Packham2019-08-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We set the field 'addr_trial_end' to 'jiffies', instead of the current value 0, at the moment the node address is initialized. This guarantees we don't inadvertently enter an address trial period when the node address is explicitly set by the user. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: dsa: Check existence of .port_mdb_add callback before calling itChen-Yu Tsai2019-08-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dsa framework has optional .port_mdb_{prepare,add,del} callback fields for drivers to handle multicast database entries. When adding an entry, the framework goes through a prepare phase, then a commit phase. Drivers not providing these callbacks should be detected in the prepare phase. DSA core may still bypass the bridge layer and call the dsa_port_mdb_add function directly with no prepare phase or no switchdev trans object, and the framework ends up calling an undefined .port_mdb_add callback. This results in a NULL pointer dereference, as shown in the log below. The other functions seem to be properly guarded. Do the same for .port_mdb_add in dsa_switch_mdb_add_bitmap() as well. 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = (ptrval) [00000000] *pgd=00000000 Internal error: Oops: 80000005 [#1] SMP ARM Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211 CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1 Hardware name: Allwinner sun7i (A20) Family Workqueue: events switchdev_deferred_process_work PC is at 0x0 LR is at dsa_switch_event+0x570/0x620 pc : [<00000000>] lr : [<c08533ec>] psr: 80070013 sp : ee871db8 ip : 00000000 fp : ee98d0a4 r10: 0000000c r9 : 00000008 r8 : ee89f710 r7 : ee98d040 r6 : ee98d088 r5 : c0f04c48 r4 : ee98d04c r3 : 00000000 r2 : ee89f710 r1 : 00000008 r0 : ee98d040 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6deb406a DAC: 00000051 Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval)) Stack: (0xee871db8 to 0xee872000) 1da0: ee871e14 103ace2d 1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000 1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0 1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000 1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff 1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4 1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500 1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000 1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8 1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122 1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec 1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc 1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00 1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000 1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4 1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000 1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20) [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74) [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4) [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14) [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4) [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68) [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8) [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104) [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c) [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104) [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14) [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c) [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc) [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150) [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c) Exception stack(0xee871fb0 to 0xee871ff8) 1fa0: 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: bad PC value ---[ end trace 1292c61abd17b130 ]--- [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) corresponds to $ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec linux/net/dsa/switch.c:156 linux/net/dsa/switch.c:178 linux/net/dsa/switch.c:328 Fixes: e6db98db8a95 ("net: dsa: add switch mdb bitmap functions") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>