| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 47d3d7fdb10a21c223036b58bd70ffdc24a472c4 upstream.
Since ip6frag_expire_frag_queue() now pulls the head skb
from frag queue, we should no longer use skb_get(), since
this leads to an skb leak.
Stefan Bader initially reported a problem in 4.4.stable [1] caused
by the skb_get(), so this patch should also fix this issue.
296583.091021] kernel BUG at /build/linux-6VmqmP/linux-4.4.0/net/core/skbuff.c:1207!
[296583.091734] Call Trace:
[296583.091749] [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
[296583.091764] [<ffffffff8183939a>] _decode_session6+0x26a/0x400
[296583.091779] [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
[296583.091795] [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
[296583.091809] [<ffffffff81824421>] icmp6_send+0x5e1/0x940
[296583.091823] [<ffffffff81753238>] ? __netif_receive_skb+0x18/0x60
[296583.091838] [<ffffffff817532b2>] ? netif_receive_skb_internal+0x32/0xa0
[296583.091858] [<ffffffffc0199f74>] ? ixgbe_clean_rx_irq+0x594/0xac0 [ixgbe]
[296583.091876] [<ffffffffc04eb260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[296583.091893] [<ffffffff8183d431>] icmpv6_send+0x21/0x30
[296583.091906] [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
[296583.091921] [<ffffffffc04eb27f>] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6]
[296583.091938] [<ffffffff810f3b57>] call_timer_fn+0x37/0x140
[296583.091951] [<ffffffffc04eb260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[296583.091968] [<ffffffff810f5464>] run_timer_softirq+0x234/0x330
[296583.091982] [<ffffffff8108a339>] __do_softirq+0x109/0x2b0
Fixes: d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6 defrag")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: Peter Oskolkov <posk@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit dbb2483b2a46fbaf833cfb5deb5ed9cace9c7399 upstream.
In commit 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
I introduced a check for xfrm protocol, but according to Herbert
IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
it should be removed from validate_tmpl().
And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
protocols, this is why xfrm_state_flush() could still miss
IPPROTO_ROUTING, which leads that those entries are left in
net->xfrm.state_all before exit net. Fix this by replacing
IPSEC_PROTO_ANY with zero.
This patch also extracts the check from validate_tmpl() to
xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
With this, no other protocols should be added into xfrm.
Fixes: 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit dbf47a2a094edf58983265e323ca4bdcdb58b5ee ]
Action sample doesn't properly handle psample_group pointer in overwrite
case. Following issues need to be fixed:
- In tcf_sample_init() function RCU_INIT_POINTER() is used to set
s->psample_group, even though we neither setting the pointer to NULL, nor
preventing concurrent readers from accessing the pointer in some way.
Use rcu_swap_protected() instead to safely reset the pointer.
- Old value of s->psample_group is not released or deallocated in any way,
which results resource leak. Use psample_group_put() on non-NULL value
obtained with rcu_swap_protected().
- The function psample_group_put() that released reference to struct
psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
grace period when deallocating it. Extend struct psample_group with rcu
head and use kfree_rcu when freeing it.
Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 981471bd3abf4d572097645d765391533aac327d ]
The net pointer in struct xt_tgdtor_param is not explicitly
initialized therefore is still NULL when dereferencing it.
So we have to find a way to pass the correct net pointer to
ipt_destroy_target().
The best way I find is just saving the net pointer inside the per
netns struct tcf_idrinfo, which could make this patch smaller.
Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
Reported-and-tested-by: itugrok@yahoo.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
triggers following stack trace:
[25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
[25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
[25244.888167] Call Trace:
[25244.889182] <IRQ>
[25244.890001] tcp_fragment+0x9c/0x2cf
[25244.891295] tcp_write_xmit+0x68f/0x988
[25244.892732] __tcp_push_pending_frames+0x3b/0xa0
[25244.894347] tcp_data_snd_check+0x2a/0xc8
[25244.895775] tcp_rcv_established+0x2a8/0x30d
[25244.897282] tcp_v4_do_rcv+0xb2/0x158
[25244.898666] tcp_v4_rcv+0x692/0x956
[25244.899959] ip_local_deliver_finish+0xeb/0x169
[25244.901547] __netif_receive_skb_core+0x51c/0x582
[25244.903193] ? inet_gro_receive+0x239/0x247
[25244.904756] netif_receive_skb_internal+0xab/0xc6
[25244.906395] napi_gro_receive+0x8a/0xc0
[25244.907760] receive_buf+0x9a1/0x9cd
[25244.909160] ? load_balance+0x17a/0x7b7
[25244.910536] ? vring_unmap_one+0x18/0x61
[25244.911932] ? detach_buf+0x60/0xfa
[25244.913234] virtnet_poll+0x128/0x1e1
[25244.914607] net_rx_action+0x12a/0x2b1
[25244.915953] __do_softirq+0x11c/0x26b
[25244.917269] ? handle_irq_event+0x44/0x56
[25244.918695] irq_exit+0x61/0xa0
[25244.919947] do_IRQ+0x9d/0xbb
[25244.921065] common_interrupt+0x85/0x85
[25244.922479] </IRQ>
tcp_rtx_queue_tail() (called by tcp_fragment()) can call
tcp_write_queue_prev() on the first packet in the queue, which will trigger
the BUG in tcp_write_queue_prev(), because there is no previous packet.
This happens when the retransmit queue is empty, for example in case of a
zero window.
Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") was not a
simple cherry-pick of the original one from master (b617158dc096)
because there is a specific TCP rtx queue only since v4.15. For more
details, please see the commit message of b617158dc096 ("tcp: be more
careful in tcp_fragment()").
The BUG() is hit due to the specific code added to versions older than
v4.15. The comment in skb_queue_prev() (include/linux/skbuff.h:1406),
just before the BUG_ON() somehow suggests to add a check before using
it, what Tim did.
In master, this code path causing the issue will not be taken because
the implementation of tcp_rtx_queue_tail() is different:
tcp_fragment() → tcp_rtx_queue_tail() → tcp_write_queue_prev() →
skb_queue_prev() → BUG_ON()
Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit e4db5b61c572475bbbcf63e3c8a2606bfccf2c9d upstream.
Kristian Evensen says:
In a project I am involved in, we are running ipsec (Strongswan) on
different mt7621-based routers. Each router is configured as an
initiator and has around ~30 tunnels to different responders (running
on misc. devices). Before the flow cache was removed (kernel 4.9), we
got a combined throughput of around 70Mbit/s for all tunnels on one
router. However, we recently switched to kernel 4.14 (4.14.48), and
the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
drop of around 20%. Reverting the flow cache removal restores, as
expected, performance levels to that of kernel 4.9.
When pcpu xdst exists, it has to be validated first before it can be
used.
A negative hit thus increases cost vs. no-cache.
As number of tunnels increases, hit rate decreases so this pcpu caching
isn't a viable strategy.
Furthermore, the xdst cache also needs to run with BH off, so when
removing this the bh disable/enable pairs can be removed too.
Kristian tested a 4.14.y backport of this change and reported
increased performance:
In our tests, the throughput reduction has been reduced from around -20%
to -5%. We also see that the overall throughput is independent of the
number of tunnels, while before the throughput was reduced as the number
of tunnels increased.
Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit e99e7745d03fc50ba7c5b7c91c17294fee2d5991.
Ben Hutchings writes:
>Sorry, this is the same issue that was already fixed by "tcp: reset
>sk_send_head in tcp_write_queue_purge". You can drop my version from
>the queue for 4.4 and 4.9 and revert it for 4.14.
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Denis Andzakovic discovered a potential use-after-free in older kernel
versions, using syzkaller. tcp_write_queue_purge() frees all skbs in
the TCP write queue and can leave sk->sk_send_head pointing to freed
memory. tcp_disconnect() clears that pointer after calling
tcp_write_queue_purge(), but tcp_connect() does not. It is
(surprisingly) possible to add to the write queue between
disconnection and reconnection, so this needs to be done in both
places.
This bug was introduced by backports of commit 7f582b248d0a ("tcp:
purge write queue in tcp_connect_init()") and does not exist upstream
because of earlier changes in commit 75c119afe14f ("tcp: implement
rb-tree based retransmit queue"). The latter is a major change that's
not suitable for stable.
Reported-by: Denis Andzakovic <denis.andzakovic@pulsesecurity.co.nz>
Bisected-by: Salvatore Bonaccorso <carnil@debian.org>
Fixes: 7f582b248d0a ("tcp: purge write queue in tcp_connect_init()")
Cc: <stable@vger.kernel.org> # before 4.15
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit b617158dc096709d8600c53b6052144d12b89fab upstream.
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}
Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3b4477d2dcf2709d0be89e2a8dced3d0f4a017f2 upstream.
There are two state fields: socket->state and sock->sk_state. The
socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
sock->sk_state typically uses values that match TCP state constants
(TCP_CLOSE, TCP_ESTABLISHED). AF_VSOCK does not follow this convention
and instead uses SS_* constants for both fields.
The sk_state field will be exposed to userspace through the vsock_diag
interface for ss(8), netstat(8), and other programs.
This patch switches sk_state to TCP state constants so that the meaning
of this field is consistent with other address families. Not just
AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.
The following mapping was used to convert the code:
SS_FREE -> TCP_CLOSE
SS_UNCONNECTED -> TCP_CLOSE
SS_CONNECTING -> TCP_SYN_SENT
SS_CONNECTED -> TCP_ESTABLISHED
SS_DISCONNECTING -> TCP_CLOSING
VSOCK_SS_LISTEN -> TCP_LISTEN
In __vsock_create() the sk_state initialization was dropped because
sock_init_data() already initializes sk_state to TCP_CLOSE.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Adjusted net/vmw_vsock/hyperv_transport.c since the commit
b4562ca7925a ("hv_sock: add locking in the open/close/release code paths")
and the commit
c9d3fe9da094 ("VSOCK: fix outdated sk_state value in hvs_release()")
were backported before 3b4477d2dcf2.]
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 8d650cdedaabb33e85e9b7c517c0c71fcecc1de9 ]
Neal reported incorrect use of ns_capable() from bpf hook.
bpf_setsockopt(...TCP_CONGESTION...)
-> tcp_set_congestion_control()
-> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)
-> ns_capable_common()
-> current_cred()
-> rcu_dereference_protected(current->cred, 1)
Accessing 'current' in bpf context makes no sense, since packets
are processed from softirq context.
As Neal stated : The capability check in tcp_set_congestion_control()
was written assuming a system call context, and then was reused from
a BPF call site.
The fix is to add a new parameter to tcp_set_congestion_control(),
so that the ns_capable() call is only performed under the right
context.
Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit b60a77386b1d4868f72f6353d35dabe5fbe981f2 ]
netfilter did not expect that skb_dst_force() can cause skb to lose its
dst entry.
I got a bug report with a skb->dst NULL dereference in netfilter
output path. The backtrace contains nf_reinject(), so the dst might have
been cleared when skb got queued to userspace.
Other users were fixed via
if (skb_dst(skb)) {
skb_dst_force(skb);
if (!skb_dst(skb))
goto handle_err;
}
But I think its preferable to make the 'dst might be cleared' part
of the function explicit.
In netfilter case, skb with a null dst is expected when queueing in
prerouting hook, so drop skb for the other hooks.
v2:
v1 of this patch returned true in case skb had no dst entry.
Eric said:
Say if we have two skb_dst_force() calls for some reason
on the same skb, only the first one will return false.
This now returns false even when skb had no dst, as per Erics
suggestion, so callers might need to check skb_dst() first before
skb_dst_force().
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 5db7c8b9f9fc2aeec671ae3ca6375752c162e0e7 ]
syzkaller reports for memory leak in start_sync_thread [1]
As Eric points out, kthread may start and stop before the
threadfn function is called, so there is no chance the
data (tinfo in our case) to be released in thread.
Fix this by releasing tinfo in the controlling code instead.
[1]
BUG: memory leak
unreferenced object 0xffff8881206bf700 (size 32):
comm "syz-executor761", pid 7268, jiffies 4294943441 (age 20.470s)
hex dump (first 32 bytes):
00 40 7c 09 81 88 ff ff 80 45 b8 21 81 88 ff ff .@|......E.!....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000057619e23>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
[<0000000057619e23>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<0000000057619e23>] slab_alloc mm/slab.c:3326 [inline]
[<0000000057619e23>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
[<0000000086ce5479>] kmalloc include/linux/slab.h:547 [inline]
[<0000000086ce5479>] start_sync_thread+0x5d2/0xe10 net/netfilter/ipvs/ip_vs_sync.c:1862
[<000000001a9229cc>] do_ip_vs_set_ctl+0x4c5/0x780 net/netfilter/ipvs/ip_vs_ctl.c:2402
[<00000000ece457c8>] nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
[<00000000ece457c8>] nf_setsockopt+0x4c/0x80 net/netfilter/nf_sockopt.c:115
[<00000000942f62d4>] ip_setsockopt net/ipv4/ip_sockglue.c:1258 [inline]
[<00000000942f62d4>] ip_setsockopt+0x9b/0xb0 net/ipv4/ip_sockglue.c:1238
[<00000000a56a8ffd>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
[<00000000fa895401>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3130
[<0000000095eef4cf>] __sys_setsockopt+0x98/0x120 net/socket.c:2078
[<000000009747cf88>] __do_sys_setsockopt net/socket.c:2089 [inline]
[<000000009747cf88>] __se_sys_setsockopt net/socket.c:2086 [inline]
[<000000009747cf88>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
[<00000000ded8ba80>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
[<00000000893b4ac8>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported-by: syzbot+7e2e50c8adfccd2e5041@syzkaller.appspotmail.com
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 998e7a76804b ("ipvs: Use kthread_run() instead of doing a double-fork via kernel_thread()")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6f6a8622057c92408930c31698394fae1557b188 ]
A similar fix to Patch "ip_tunnel: allow not to count pkts on tstats by
setting skb's dev to NULL" is also needed by ip6_tunnel.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream.
The minimum encryption key size for LE connections is 56 bits and to
align LE with BR/EDR, enforce 56 bits of minimum encryption key size for
BR/EDR connections as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream.
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.
This forces the stack to send packets with a very high network/cpu
overhead.
Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.
In some cases, it can be useful to increase the minimal value
to a saner value.
We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.
Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.
We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3b4929f65b0d8249f19a50245cd88ed1a2f78cff upstream.
Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :
BUG_ON(tcp_skb_pcount(skb) < pcount);
This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48
An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.
This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.
Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.
CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
Backport notes, provided by Joao Martins <joao.m.martins@oracle.com>
v4.15 or since commit 737ff314563 ("tcp: use sequence distance to
detect reordering") had switched from the packet-based FACK tracking and
switched to sequence-based.
v4.14 and older still have the old logic and hence on
tcp_skb_shift_data() needs to retain its original logic and have
@fack_count in sync. In other words, we keep the increment of pcount with
tcp_skb_pcount(skb) to later used that to update fack_count. To make it
more explicit we track the new skb that gets incremented to pcount in
@next_pcount, and we get to avoid the constant invocation of
tcp_skb_pcount(skb) all together.
Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
connections"
This reverts commit 2fa7a155b25160696cd77cdd995536cf5e172e20 which is
commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream.
Lots of people have reported issues with this patch, and as there does
not seem to be a fix going into Linus's kernel tree any time soon,
revert the commit in the stable trees so as to get people's machines
working properly again.
Reported-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reported-by: Hans de Goede <hdegoede@redhat.com>
Cc: Jeremy Cline <jeremy@jcline.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9b3040a6aafd7898ece7fc7efcbca71e42aa8069 upstream.
Define __ipv4_neigh_lookup_noref to return NULL when CONFIG_INET is disabled.
Fixes: 4b2a2bfeb3f0 ("neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit b7999b07726c16974ba9ca3bb9fe98ecbec5f81c ]
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.
It's caused by Commit 93531c674315 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.
c1:
(rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit df453700e8d81b1bdafdf684365ee2b9431fb702 ]
According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.
Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.
It is time to switch to siphash and its 128bit keys.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6491d698396fd5da4941980a35ca7c162a672016 ]
This is similar to commit e285d5bfb7e9 ("NFC: Fix the number of pipes")
where we changed NFC_HCI_MAX_PIPES from 127 to 128.
As the comment next to the define explains, the pipe identifier is 7
bits long. The highest possible pipe is 127, but the number of possible
pipes is 128. As the code is now, then there is potential for an
out of bounds array access:
net/nfc/nci/hci.c:297 nci_hci_cmd_received() warn: array off by one?
'ndev->hci_dev->pipes[pipe]' '0-127 == 127'
Fixes: 11f54f228643 ("NFC: nci: Add HCI over NCI protocol support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3c79107631db1f7fd32cf3f7368e4672004a3010 ]
else, we leak the addresses to userspace via ctnetlink events
and dumps.
Compute an ID on demand based on the immutable parts of nf_conn struct.
Another advantage compared to using an address is that there is no
immediate re-use of the same ID in case the conntrack entry is freed and
reallocated again immediately.
Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID")
Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream.
The minimum encryption key size for LE connections is 56 bits and to
align LE with BR/EDR, enforce 56 bits of minimum encryption key size for
BR/EDR connections as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit ce6289661b14a8b391d90db918c91b6d6da6540a upstream.
When CONFIG_KASAN is set, we can use relatively large amounts of kernel
stack space:
net/caif/cfctrl.c:555:1: warning: the frame size of 1600 bytes is larger than 1280 bytes [-Wframe-larger-than=]
This adds convenience wrappers around cfpkt_extr_head(), which is responsible
for most of the stack growth. With those wrapper functions, gcc apparently
starts reusing the stack slots for each instance, thus avoiding the
problem.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit fbd019737d71e405f86549fd738f81e2ff3dd073 ]
Ying triggered a call trace when doing an asconf testing:
BUG: scheduling while atomic: swapper/12/0/0x10000100
Call Trace:
<IRQ> [<ffffffffa4375904>] dump_stack+0x19/0x1b
[<ffffffffa436fcaf>] __schedule_bug+0x64/0x72
[<ffffffffa437b93a>] __schedule+0x9ba/0xa00
[<ffffffffa3cd5326>] __cond_resched+0x26/0x30
[<ffffffffa437bc4a>] _cond_resched+0x3a/0x50
[<ffffffffa3e22be8>] kmem_cache_alloc_node+0x38/0x200
[<ffffffffa423512d>] __alloc_skb+0x5d/0x2d0
[<ffffffffc0995320>] sctp_packet_transmit+0x610/0xa20 [sctp]
[<ffffffffc098510e>] sctp_outq_flush+0x2ce/0xc00 [sctp]
[<ffffffffc098646c>] sctp_outq_uncork+0x1c/0x20 [sctp]
[<ffffffffc0977338>] sctp_cmd_interpreter.isra.22+0xc8/0x1460 [sctp]
[<ffffffffc0976ad1>] sctp_do_sm+0xe1/0x350 [sctp]
[<ffffffffc099443d>] sctp_primitive_ASCONF+0x3d/0x50 [sctp]
[<ffffffffc0977384>] sctp_cmd_interpreter.isra.22+0x114/0x1460 [sctp]
[<ffffffffc0976ad1>] sctp_do_sm+0xe1/0x350 [sctp]
[<ffffffffc097b3a4>] sctp_assoc_bh_rcv+0xf4/0x1b0 [sctp]
[<ffffffffc09840f1>] sctp_inq_push+0x51/0x70 [sctp]
[<ffffffffc099732b>] sctp_rcv+0xa8b/0xbd0 [sctp]
As it shows, the first sctp_do_sm() running under atomic context (NET_RX
softirq) invoked sctp_primitive_ASCONF() that uses GFP_KERNEL flag later,
and this flag is supposed to be used in non-atomic context only. Besides,
sctp_do_sm() was called recursively, which is not expected.
Vlad tried to fix this recursive call in Commit c0786693404c ("sctp: Fix
oops when sending queued ASCONF chunks") by introducing a new command
SCTP_CMD_SEND_NEXT_ASCONF. But it didn't work as this command is still
used in the first sctp_do_sm() call, and sctp_primitive_ASCONF() will
be called in this command again.
To avoid calling sctp_do_sm() recursively, we send the next queued ASCONF
not by sctp_primitive_ASCONF(), but by sctp_sf_do_prm_asconf() in the 1st
sctp_do_sm() directly.
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit fe384e2fa36ca084a456fd30558cccc75b4b3fbd ]
callers of tcf_gact_goto_chain_index() can potentially read an old value
of the chain index, or even dereference a NULL 'goto_chain' pointer,
because 'goto_chain' and 'tcfa_action' are read in the traffic path
without caring of concurrent write in the control path. The most recent
value of chain index can be read also from a->tcfa_action (it's encoded
there together with TC_ACT_GOTO_CHAIN bits), so we don't really need to
dereference 'goto_chain': just read the chain id from the control action.
Fixes: e457d86ada27 ("net: sched: add couple of goto_chain helpers")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d4289fcc9b16b89619ee1c54f829e05e56de8b9a ]
Currently, IPv6 defragmentation code drops non-last fragments that
are smaller than 1280 bytes: see
commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags smaller than min mtu")
This behavior is not specified in IPv6 RFCs and appears to break
compatibility with some IPv6 implemenations, as reported here:
https://www.spinics.net/lists/netdev/msg543846.html
This patch re-uses common IP defragmentation queueing and reassembly
code in IPv6, removing the 1280 byte restriction.
v2: change handling of overlaps to match that of upstream.
Signed-off-by: Peter Oskolkov <posk@google.com>
Reported-by: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 70b095c84326640eeacfd69a411db8fc36e8ab1a ]
IPV6=m
DEFRAG_IPV6=m
CONNTRACK=y yields:
net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get':
net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable'
net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6'
Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params
ip6_frag_init and ip6_expire_frag_queue so it would be needed to force
IPV6=y too.
This patch gets rid of the 'followup linker error' by removing
the dependency of ipv6.ko symbols from netfilter ipv6 defrag.
Shared code is placed into a header, then used from both.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c23f35d19db3b36ffb9e04b08f1d91565d15f84f ]
This is a refactoring patch: without changing runtime behavior,
it moves rbtree-related code from IPv4-specific files/functions
into .h/.c defrag files shared with IPv6 defragmentation code.
v2: make handling of overlapping packets match upstream.
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)
I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ]
Configuration check to accept source route IP options should be made on
the incoming netdevice when the skb->dev is an l3mdev master. The route
lookup for the source route next hop also needs the incoming netdev.
v2->v3:
- Simplify by passing the original netdevice down the stack (per David
Ahern).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 8e2f311a68494a6677c1724bdcb10bada21af37c ]
Following command:
iptables -D FORWARD -m physdev ...
causes connectivity loss in some setups.
Reason is that iptables userspace will probe kernel for the module revision
of the physdev patch, and physdev has an artificial dependency on
br_netfilter (xt_physdev use makes no sense unless a br_netfilter module
is loaded).
This causes the "phydev" module to be loaded, which in turn enables the
"call-iptables" infrastructure.
bridged packets might then get dropped by the iptables ruleset.
The better fix would be to change the "call-iptables" defaults to 0 and
enforce explicit setting to 1, but that breaks backwards compatibility.
This does the next best thing: add a request_module call to checkentry.
This was a stray '-D ... -m physdev' won't activate br_netfilter
anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 273160ffc6b993c7c91627f5a84799c66dfe4dee ]
sctp_hdr(skb) only works when skb->transport_header is set properly.
But in Netfilter, skb->transport_header for ipv6 is not guaranteed
to be right value for sctphdr. It would cause to fail to check the
checksum for sctp packets.
So fix it by using offset, which is always right in all places.
v1->v2:
- Fix the changelog.
Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit a4dc6a49156b1f8d6e17251ffda17c9e6a5db78a ]
When using fanouts with AF_PACKET, the demux functions such as
fanout_demux_cpu will return an index in the fanout socket array, which
corresponds to the selected socket.
The ordering of this array depends on the order the sockets were added
to a given fanout group, so for FANOUT_CPU this means sockets are bound
to cpus in the order they are configured, which is OK.
However, when stopping then restarting the interface these sockets are
bound to, the sockets are reassigned to the fanout group in the reverse
order, due to the fact that they were inserted at the head of the
interface's AF_PACKET socket list.
This means that traffic that was directed to the first socket in the
fanout group is now directed to the last one after an interface restart.
In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
socket that used to receive traffic from the last CPU after an interface
restart.
This commit introduces a helper to add a socket at the tail of a list,
then uses it to register AF_PACKET sockets.
Note that this changes the order in which sockets are listed in /proc and
with sock_diag.
Fixes: dc99f600698d ("packet: Add fanout support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6321aa197547da397753757bd84c6ce64b3e3d89 ]
clang warns about overflowing the data[] member in the struct pnpipehdr:
net/phonet/pep.c:295:8: warning: array index 4 is past the end of the array (which contains 1 element) [-Warray-bounds]
if (hdr->data[4] == PEP_IND_READY)
^ ~
include/net/phonet/pep.h:66:3: note: array 'data' declared here
u8 data[1];
Using a flexible array member at the end of the struct avoids the
warning, but since we cannot have a flexible array member inside
of the union, each index now has to be moved back by one, which
makes it a little uglier.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit c4f5627f7eeecde1bb6b646d8c0907b96dc2b2a6 upstream.
With commit e16337622016 ("Bluetooth: Handle bt_accept_enqueue() socket
atomically") lock_sock[_nested]() is used to acquire the socket lock
before manipulating the socket. lock_sock[_nested]() may block, which
is problematic since bt_accept_enqueue() can be called in bottom half
context (e.g. from rfcomm_connect_ind()):
[<ffffff80080d81ec>] __might_sleep+0x4c/0x80
[<ffffff800876c7b0>] lock_sock_nested+0x24/0x58
[<ffffff8000d7c27c>] bt_accept_enqueue+0x48/0xd4 [bluetooth]
[<ffffff8000e67d8c>] rfcomm_connect_ind+0x190/0x218 [rfcomm]
Add a parameter to bt_accept_enqueue() to indicate whether the
function is called from BH context, and acquire the socket lock
with bh_lock_sock_nested() if that's the case.
Also adapt all callers of bt_accept_enqueue() to pass the new
parameter:
- l2cap_sock_new_connection_cb()
- uses lock_sock() to lock the parent socket => process context
- rfcomm_connect_ind()
- acquires the parent socket lock with bh_lock_sock() => BH
context
- __sco_chan_add()
- called from sco_chan_add(), which is called from sco_connect().
parent is NULL, hence bt_accept_enqueue() isn't called in this
code path and we can ignore it
- also called from sco_conn_ready(). uses bh_lock_sock() to acquire
the parent lock => BH context
Fixes: e16337622016 ("Bluetooth: Handle bt_accept_enqueue() socket atomically")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3da1ed7ac398f34fff1694017a07054d69c5f5c5 ]
Extract IP options in cipso_v4_error and use __icmp_send.
Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 9ef6b42ad6fd7929dd1b6092cb02014e382c6a91 ]
Add __icmp_send function having ip_options struct parameter
Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 63530aba7826a0f8e129874df9c4d264f9db3f9e upstream.
syzbot found that ax25 routes where not properly protected
against concurrent use [1].
In this particular report the bug happened while
copying ax25->digipeat.
Fix this problem by making sure we call ax25_get_route()
while ax25_route_lock is held, so that no modification
could happen while using the route.
The current two ax25_get_route() callers do not sleep,
so this change should be fine.
Once we do that, ax25_get_route() no longer needs to
grab a reference on the found route.
[1]
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
BUG: KASAN: use-after-free in memcpy include/linux/string.h:352 [inline]
BUG: KASAN: use-after-free in kmemdup+0x42/0x60 mm/util.c:113
Read of size 66 at addr ffff888066641a80 by task syz-executor2/531
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
CPU: 1 PID: 531 Comm: syz-executor2 Not tainted 5.0.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
check_memory_region_inline mm/kasan/generic.c:185 [inline]
check_memory_region+0x123/0x190 mm/kasan/generic.c:191
memcpy+0x24/0x50 mm/kasan/common.c:130
memcpy include/linux/string.h:352 [inline]
kmemdup+0x42/0x60 mm/util.c:113
kmemdup include/linux/string.h:425 [inline]
ax25_rt_autobind+0x25d/0x750 net/ax25/ax25_route.c:424
ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1224
__sys_connect+0x357/0x490 net/socket.c:1664
__do_sys_connect net/socket.c:1675 [inline]
__se_sys_connect net/socket.c:1672 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:1672
do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f870ee22c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458099
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f870ee236d4
R13: 00000000004be48e R14: 00000000004ce9a8 R15: 00000000ffffffff
Allocated by task 526:
save_stack+0x45/0xd0 mm/kasan/common.c:73
set_track mm/kasan/common.c:85 [inline]
__kasan_kmalloc mm/kasan/common.c:496 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
kmalloc include/linux/slab.h:545 [inline]
ax25_rt_add net/ax25/ax25_route.c:95 [inline]
ax25_rt_ioctl+0x3b9/0x1270 net/ax25/ax25_route.c:233
ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
sock_do_ioctl+0xe2/0x400 net/socket.c:950
sock_ioctl+0x32f/0x6c0 net/socket.c:1074
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
Freed by task 550:
save_stack+0x45/0xd0 mm/kasan/common.c:73
set_track mm/kasan/common.c:85 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
__cache_free mm/slab.c:3487 [inline]
kfree+0xcf/0x230 mm/slab.c:3806
ax25_rt_add net/ax25/ax25_route.c:92 [inline]
ax25_rt_ioctl+0x304/0x1270 net/ax25/ax25_route.c:233
ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
sock_do_ioctl+0xe2/0x400 net/socket.c:950
sock_ioctl+0x32f/0x6c0 net/socket.c:1074
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff888066641a80
which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
96-byte region [ffff888066641a80, ffff888066641ae0)
The buggy address belongs to the page:
page:ffffea0001999040 count:1 mapcount:0 mapping:ffff88812c3f04c0 index:0x0
flags: 0x1fffc0000000200(slab)
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
raw: 01fffc0000000200 ffffea0001817948 ffffea0002341dc8 ffff88812c3f04c0
raw: 0000000000000000 ffff888066641000 0000000100000020 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
^
ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 04c03114be82194d4a4858d41dba8e286ad1787c ]
soukjin bae reported a crash in tcp_v4_err() handling
ICMP_DEST_UNREACH after tcp_write_queue_head(sk)
returned a NULL pointer.
Current logic should have prevented this :
if (seq != tp->snd_una || !icsk->icsk_retransmits ||
!icsk->icsk_backoff || fastopen)
break;
Problem is the write queue might have been purged
and icsk_backoff has not been cleared.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: soukjin bae <soukjin.bae@samsung.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c09551c6ff7fe16a79a42133bcecba5fc2fc3291 ]
According to the algorithm described in the comment block at the
beginning of ip_rt_send_redirect, the host should try to send
'ip_rt_redirect_number' ICMP redirect packets with an exponential
backoff and then stop sending them at all assuming that the destination
ignores redirects.
If the device has previously sent some ICMP error packets that are
rate-limited (e.g TTL expired) and continues to receive traffic,
the redirect packets will never be transmitted. This happens since
peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
and so it will never be reset even if the redirect silence timeout
(ip_rt_redirect_silence) has elapsed without receiving any packet
requiring redirects.
Fix it by using a dedicated counter for the number of ICMP redirect
packets that has been sent by the host
I have not been able to identify a given commit that introduced the
issue since ip_rt_send_redirect implements the same rate-limiting
algorithm from commit 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d5256083f62e2720f75bb3c5a928a0afe47d6bc3 ]
While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.
Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net->loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.
Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb->dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.
Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev->priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.
[0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Martynas Pumputis <m@lambda.lt>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f97f4dd8b3bb9d0993d2491e0f22024c68109184 ]
IPv4 routing tables are flushed in two cases:
1. In response to events in the netdev and inetaddr notification chains
2. When a network namespace is being dismantled
In both cases only routes associated with a dead nexthop group are
flushed. However, a nexthop group will only be marked as dead in case it
is populated with actual nexthops using a nexthop device. This is not
the case when the route in question is an error route (e.g.,
'blackhole', 'unreachable').
Therefore, when a network namespace is being dismantled such routes are
not flushed and leaked [1].
To reproduce:
# ip netns add blue
# ip -n blue route add unreachable 192.0.2.0/24
# ip netns del blue
Fix this by not skipping error routes that are not marked with
RTNH_F_DEAD when flushing the routing tables.
To prevent the flushing of such routes in case #1, add a parameter to
fib_table_flush() that indicates if the table is flushed as part of
namespace dismantle or not.
Note that this problem does not exist in IPv6 since error routes are
associated with the loopback device.
[1]
unreferenced object 0xffff888066650338 (size 56):
comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff ..........ba....
e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00 ...d............
backtrace:
[<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
[<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
[<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
[<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
[<0000000014f62875>] netlink_sendmsg+0x929/0xe10
[<00000000bac9d967>] sock_sendmsg+0xc8/0x110
[<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
[<000000002e94f880>] __sys_sendmsg+0xf7/0x250
[<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
[<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<000000003a8b605b>] 0xffffffffffffffff
unreferenced object 0xffff888061621c88 (size 48):
comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
hex dump (first 32 bytes):
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff kkkkkkkk..&_....
backtrace:
[<00000000733609e3>] fib_table_insert+0x978/0x1500
[<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
[<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
[<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
[<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
[<0000000014f62875>] netlink_sendmsg+0x929/0xe10
[<00000000bac9d967>] sock_sendmsg+0xc8/0x110
[<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
[<000000002e94f880>] __sys_sendmsg+0xf7/0x250
[<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
[<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<000000003a8b605b>] 0xffffffffffffffff
Fixes: 8cced9eff1d4 ("[NETNS]: Enable routing configuration in non-initial namespace.")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 21ba8847f857028dc83a0f341e16ecc616e34740 upstream.
Currently, we use check_hlist() for garbage colleciton. However, we
use the ‘zone’ from the counted entry to query the existence of
existing entries in the hlist. This could be wrong when they are in
different zones, and this patch fixes this issue.
Fixes: e59ea3df3fc2 ("netfilter: xt_connlimit: honor conntrack zone if available")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[mfo: backport: refresh context lines and use older symbol/file names, note hunk 5:
- nf_conncount.c -> xt_connlimit.c
- nf_conncount_rb -> xt_connlimit_rb
- nf_conncount_tuple -> xt_connlimit_conn
- hunk 5: remove check for non-NULL 'tuple', that isn't required as it's introduced
by upstream commit 35d8deb80 ("netfilter: conncount: Support count only use case")
which addresses nf_conncount_count() that does not exist yet -- it's introduced by
upstream commit 625c556118f3 ("netfilter: connlimit: split xt_connlimit into front
and backend"), a refactor change.
- nft_connlimit.c -> removed, not used/doesn't exist yet.]
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 5e5cbc7b23eaf13e18652c03efbad5be6995de6a upstream.
This patch provides an interface to maintain the list of connections and
the lookup function to obtain the number of connections in the list.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[mfo: backport: refresh context lines and use older symbol/file names:
- nf_conntrack_count.h: new file, add include guards.
- nf_conncount.c -> xt_connlimit.c.
- nf_conncount_rb -> xt_connlimit_rb
- nf_conncount_tuple -> xt_connlimit_conn
- conncount_rb_cachep -> connlimit_rb_cachep
- conncount_conn_cachep -> connlimit_conn_cachep]
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3a0ed3e9619738067214871e9cb826fa23b2ddb9 ]
Al Viro mentioned (Message-ID
<20170626041334.GZ10672@ZenIV.linux.org.uk>)
that there is probably a race condition
lurking in accesses of sk_stamp on 32-bit machines.
sock->sk_stamp is of type ktime_t which is always an s64.
On a 32 bit architecture, we might run into situations of
unsafe access as the access to the field becomes non atomic.
Use seqlocks for synchronization.
This allows us to avoid using spinlocks for readers as
readers do not need mutual exclusion.
Another approach to solve this is to require sk_lock for all
modifications of the timestamps. The current approach allows
for timestamps to have their own lock: sk_stamp_lock.
This allows for the patch to not compete with already
existing critical sections, and side effects are limited
to the paths in the patch.
The addition of the new field maintains the data locality
optimizations from
commit 9115e8cd2a0c ("net: reorganize struct sock for better data
locality")
Note that all the instances of the sk_stamp accesses
are either through the ioctl or the syscall recvmsg.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit fb6df5a6234c38a9c551559506a49a677ac6f07a ]
In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences
a transport's asoc under rcu_read_lock while asoc is freed not after
a grace period, which leads to a use-after-free panic.
This patch fixes it by calling kfree_rcu to make asoc be freed after
a grace period.
Note that only the asoc's memory is delayed to free in the patch, it
won't cause sk to linger longer.
Thanks Neil and Marcelo to make this clear.
Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable")
Fixes: cd2b70875058 ("sctp: check duplicate node before inserting a new transport")
Reported-by: syzbot+0b05d8aa7cb185107483@syzkaller.appspotmail.com
Reported-by: syzbot+aad231d51b1923158444@syzkaller.appspotmail.com
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e6ac64d4c4d095085d7dd71cbd05704ac99829b2 ]
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.
In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.
Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.
v2:
- instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
(Eric Dumazet)
- if we avoid the panic, though, we need to explicitly check the headroom
before the memcpy(), otherwise we'll have corrupted slabs on a running
kernel, after we warn
- use __skb_push() instead of skb_push(), as the headroom check is
already implemented here explicitly (Eric Dumazet)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit ff45d820a2df163957ad8ab459b6eb6976144c18 upstream.
Previously the TLS ulp context would leak if we attached a TLS ulp
to a socket but did not use the TLS_TX setsockopt,
or did use it but it failed.
This patch solves the issue by overriding prot[TLS_BASE_TX].close
and fixing tls_sk_proto_close to work properly
when its called with ctx->tx_conf == TLS_BASE_TX.
This patch also removes ctx->free_resources as we can use ctx->tx_conf
to obtain the relevant information.
Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.14: Keep using tls_ctx_free() as introduced by
the earlier backport of "tls: zero the crypto information from
tls_context before freeing"]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|