| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 9ef2e965e55481a52d6d91ce61977a27836268d3 ]
This is a clone of commit 2ab957492d13b ("ip_forward: Drop frames with
attached skb->sk") for ipv6.
This commit has exactly the same reasons as the above mentioned commit,
namely to prevent panics during netfilter reload or a misconfigured stack.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
| |
There was an error in the backport, which is now fixed.
Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit a8a572a6b5f2a79280d6e302cb3c1cb1fbaeb3e8 ]
Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
templates; their dst_entries counters will never be used. Move the
xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
and dst_entries_destroy for each net namespace.
The ipv4 and ipv6 xfrms each create dst_ops template, and perform
dst_entries_init on the templates. The template values are copied to each
net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops
pcpuc_entries field is a percpu counter and cannot be used correctly by
simply copying it to another object.
The result of this is a very subtle bug; changes to the dst entries
counter from one net namespace may sometimes get applied to a different
net namespace dst entries counter. This is because of how the percpu
counter works; it has a main count field as well as a pointer to the
percpu variables. Each net namespace maintains its own main count
variable, but all point to one set of percpu variables. When any net
namespace happens to change one of the percpu variables to outside its
small batch range, its count is moved to the net namespace's main count
variable. So with multiple net namespaces operating concurrently, the
dst_ops entries counter can stray from the actual value that it should
be; if counts are consistently moved from one net namespace to another
(which my testing showed is likely), then one net namespace winds up
with a negative dst_ops count while another winds up with a continually
increasing count, eventually reaching its gc_thresh limit, which causes
all new traffic on the net namespace to fail with -ENOBUFS.
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 34ae6a1aa0540f0f781dd265366036355fdc8930 ]
When a tunnel decapsulates the outer header, it has to comply
with RFC 6080 and eventually propagate CE mark into inner header.
It turns out IP6_ECN_set_ce() does not correctly update skb->csum
for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
messages and stack traces.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3e4006f0b86a5ae5eb0e8215f9a9e1db24506977 ]
When first SYNACK is sent, we already hold rcu_read_lock(), but this
is not true if a SYNACK is retransmitted, as a timer (soft) interrupt
does not hold rcu_read_lock()
Fixes: 45f6fad84cc30 ("ipv6: add complete rcu protection around np->opt")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e459dfeeb64008b2d23bdf600f03b3605dbb8152 ]
ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
Fix this by inverting ip6addrlbl_hold() check.
Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 5037e9ef9454917b047f9f3a19b4dd179fbf7cd4 ]
David Wilder reported crashes caused by dst reuse.
<quote David>
I am seeing a crash on a distro V4.2.3 kernel caused by a double
release of a dst_entry. In ipv4_dst_destroy() the call to
list_empty() finds a poisoned next pointer, indicating the dst_entry
has already been removed from the list and freed. The crash occurs
18 to 24 hours into a run of a network stress exerciser.
</quote>
Thanks to his detailed report and analysis, we were able to understand
the core issue.
IP early demux can associate a dst to skb, after a lookup in TCP/UDP
sockets.
When socket cache is not properly set, we want to store into
sk->sk_dst_cache the dst for future IP early demux lookups,
by acquiring a stable refcount on the dst.
Problem is this acquisition is simply using an atomic_inc(),
which works well, unless the dst was queued for destruction from
dst_release() noticing dst refcount went to zero, if DST_NOCACHE
was set on dst.
We need to make sure current refcount is not zero before incrementing
it, or risk double free as David reported.
This patch, being a stable candidate, adds two new helpers, and use
them only from IP early demux problematic paths.
It might be possible to merge in net-next skb_dst_force() and
skb_dst_force_safe(), but I prefer having the smallest patch for stable
kernels : Maybe some skb_dst_force() callers do not expect skb->dst
can suddenly be cleared.
Can probably be backported back to linux-3.6 kernels
Reported-by: David J. Wilder <dwilder@us.ibm.com>
Tested-by: David J. Wilder <dwilder@us.ibm.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ]
郭永刚 reported that one could simply crash the kernel as root by
using a simple program:
int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;
socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);
AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.
This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.
kernel: Call Trace:
kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89
I found no particular commit which introduced this problem.
CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: 郭永刚 <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6a61d4dbf4f54b5683e0f1e58d873cecca7cb977 ]
Parameters were updated only if the kernel was unable to find the tunnel
with the new parameters, ie only if core pamareters were updated (keys,
addr, link, type).
Now it's possible to update ttl, hoplimit, flowinfo and flags.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 45f6fad84cc305103b28d73482b344d7f5b76f39 ]
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ]
If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.
To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.
Note: similar patch has been already submitted by Yoshifuji Hideaki in
http://patchwork.ozlabs.org/patch/220979/
but got lost and forgotten for some reason.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4c6980462f32b4f282c5d8e5f7ea8070e2937725 ]
Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.
Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
CC: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 41033f029e393a64e81966cbe34d66c6cf8a2e7e ]
the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path. Remove the mcast bump, as its
not needed
Validated by the reporter, with good results
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Claus Jensen <claus.jensen@microsemi.com>
CC: Claus Jensen <claus.jensen@microsemi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 77751427a1ff25b27d47a4c36b12c3c8667855ac ]
Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.
If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)
The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.
Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 2a189f9e57650e9f310ddf4aad75d66c1233a064 ]
In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
the dev_snmp6 entry that we have already registered for this device.
Call snmp6_unregister_dev in this case.
Fixes: a317a2f19da7d ("ipv6: fail early when creating netdev named all or default")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4ece9009774596ee3df0acba65a324b7ea79387c ]
sit0 device allocates its percpu storage twice :
- One time in ipip6_tunnel_init()
- One time in ipip6_fb_tunnel_init()
Thus we leak 48 bytes per possible cpu per network namespace dismantle.
ipip6_fb_tunnel_init() can be much simpler and does not
return an error, and should be called after register_netdev()
Note that ipip6_tunnel_clone_6rd() also needs to be called
after register_netdev() (calling ipip6_tunnel_init())
Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 25b4a44c19c83d98e8c0807a7ede07c1f28eab8b ]
In the IPv6 multicast routing code the mrt_lock was not being released
correctly in the MFC iterator, as a result adding or deleting a MIF would
cause a hang because the mrt_lock could not be acquired.
This fix is a copy of the code for the IPv4 case and ensures that the lock
is released correctly.
Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e41b0bedba0293b9e1e8d1e8ed553104b9693656 ]
We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
but in error path, we try to unregister it with inet_del_offload(). This
doesn't seem correct, it should actually be inet6_del_offload(), also
ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
also uses rthdr_offload twice), but it got removed entirely later on.
Fixes: 3336288a9fea ("ipv6: Switch to using new offload infrastructure.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d4257295ba1b389c693b79de857a96e4b7cd8ac0 ]
When a tunnel is deleted, the cached dst entry should be released.
This problem may prevent the removal of a netns (seen with a x-netns IPv6
gre tunnel):
unregister_netdevice: waiting for lo to become free. Usage count = 3
CC: Dmitry Kozlov <xeb@mail.ru>
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 03645a11a570d52e70631838cb786eb4253eb463 ]
ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.
This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4c938d22c88a9ddccc8c55a85e0430e9c62b1ac5 ]
Before commit daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it
from ip6_mc_input().") MLD packets were only processed locally. After the
change, a copy of MLD packet goes through ip6_mr_input, causing
MRT6MSG_NOCACHE message to be generated to user space.
Make MLD packet only processed locally.
Fixes: daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().")
Signed-off-by: Hermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d55c670cbc54b2270a465cdc382ce71adae45785 ]
The vti6_rcv_cb and vti_rcv_cb calls were leaving the skb->mark modified
after completing the function. This resulted in the original skb->mark
value being lost. Since we only need skb->mark to be set for
xfrm_policy_check we can pull the assignment into the rcv_cb calls and then
just restore the original mark after xfrm_policy_check has been completed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit cd5279c194f89c9b97c294af4aaf4ea8c5e3c704 ]
Instead of modifying skb->mark we can simply modify the flowi_mark that is
generated as a result of the xfrm_decode_session. By doing this we don't
need to actually touch the skb->mark and it can be preserved as it passes
out through the tunnel.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 092a29a40bab8bb4530bb3e58a0597001cdecdef ]
When the kernel deleted a vti6 interface, this interface was not removed from
the tunnels list. Thus, when the ip6_vti module was removed, this old interface
was found and the kernel tried to delete it again. This was leading to a kernel
panic.
Fixes: 61220ab34948 ("vti6: Enable namespace changing")
Signed-off-by: Yao Xiwei <xiwei.yao@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit beb39db59d14990e401e235faf66a6b9b31240b0 ]
We have two problems in UDP stack related to bogus checksums :
1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()
2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.
This patch is an attempt to make things better.
We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 33b4b015e1a1ca7a8fdce40af5e71642a8ea355c ]
Commit <5cf3d46192fc> ("udp: Simplify__udp*_lib_mcast_deliver")
simplified the filter for incoming IPv6 multicast but removed
the check of the local socket address and the UDP destination
address.
This patch restores the filter to prevent sockets bound to a IPv6
multicast IP to receive other UDP traffic link unicast.
Signed-off-by: Henning Rogge <hrogge@gmail.com>
Fixes: 5cf3d46192fc ("udp: Simplify__udp*_lib_mcast_deliver")
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 21858cd02dabcf290564cbf4769b101eba54d7bb ]
commit 1d13a96c74fc ("ipv6: tcp: fix flowlabel value in ACK messages
send from TIME_WAIT") added the flow label in the last TCP packets.
Unfortunately, it was not casted properly.
This patch replace the buggy shift with be32_to_cpu/cpu_to_be32.
Fixes: 1d13a96c74fc ("ipv6: tcp: fix flowlabel value in ACK messages")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f60e5990d9c1424af9dbca60a23ba2a1c7c1ce90 ]
We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.
ipv6 does not conform with this in three places:
1) ip6_fragment: we do consult ipv6_npinfo for frag_size
2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket
3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU
Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.
Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4ad19de8774e2a7b075b3e8ea48db85adcf33fa6 ]
tcp_v6_fill_cb() will be called twice if socket's state changes from
TCP_TIME_WAIT to TCP_LISTEN. That can result in control buffer data
corruption because in the second tcp_v6_fill_cb() call it's not copying
IP6CB(skb) anymore, but 'seq', 'end_seq', etc., so we can get weird and
unpredictable results. Performance loss of up to 1200% has been observed
in LTP/vxlan03 test.
This can be fixed by copying inet6_skb_parm to the beginning of 'cb'
only if xfrm6_policy_check() and tcp_v6_fill_cb() are going to be
called again.
Fixes: 2dc49d1680b53 ("tcp6: don't move IP6CB before xfrm6_policy_check()")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a ]
A local route may have a lower hop_limit set than global routes do.
RFC 3756, Section 4.2.7, "Parameter Spoofing"
> 1. The attacker includes a Current Hop Limit of one or another small
> number which the attacker knows will cause legitimate packets to
> be dropped before they reach their destination.
> As an example, one possible approach to mitigate this threat is to
> ignore very small hop limits. The nodes could implement a
> configurable minimum hop limit, and ignore attempts to set it below
> said limit.
Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d0c294c53a771ae7e84506dfbd8c18c30f078735 ]
On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux()
struct dst_entry *dst = sk->sk_rx_dst;
if (dst)
dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
to code reading sk->sk_rx_dst twice, once for the test and once for
the argument of ip6_dst_check() (dst_check() is inline). This allows
ip6_dst_check() to be called with null first argument, causing a crash.
Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6
TCP early demux code.
Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.")
Fixes: c7109986db3c ("ipv6: Early TCP socket demux")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 73ba57bfae4a1914f6a6dac71e3168dd900e00af ]
for throw routes to trigger evaluation of other policy rules
EAGAIN needs to be propagated up to fib_rules_lookup
similar to how its done for IPv4
A simple testcase for verification is:
ip -6 rule add lookup 33333 priority 33333
ip -6 route add throw 2001:db8::1
ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333
ip route get 2001:db8::1
Signed-off-by: Steven Barth <cyrus@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 9145736d4862145684009d6a72a6e61324a9439e ]
1. For an IPv4 ping socket, ping_check_bind_addr does not check
the family of the socket address that's passed in. Instead,
make it behave like inet_bind, which enforces either that the
address family is AF_INET, or that the family is AF_UNSPEC and
the address is 0.0.0.0.
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
if the socket family is not AF_INET6. Return EAFNOSUPPORT
instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
instead of EINVAL if an incorrect socket address structure is
passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
IPv4, and it cannot easily be made to support IPv4 because
the protocol numbers for ICMP and ICMPv6 are different. This
makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
of making the socket unusable.
Among other things, this fixes an oops that can be triggered by:
int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
struct sockaddr_in6 sin6 = {
.sin6_family = AF_INET6,
.sin6_addr = in6addr_any,
};
bind(s, (struct sockaddr *) &sin6, sizeof(sin6));
Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ]
If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).
Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3b4711757d7903ab6fa88a9e7ab8901b8227da60 ]
ipv6_cow_metrics() currently assumes only DST_HOST routes require
dynamic metrics allocation from inetpeer. The assumption breaks
when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
is called after the route is created.
This patch creates the metrics array (by calling dst_cow_metrics_generic) in
ipv6_cow_metrics().
Test:
radvd.conf:
interface qemubr0
{
AdvLinkMTU 1300;
AdvCurHopLimit 30;
prefix fd00:face:face:face::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr off;
};
};
Before:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec
After:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300
fe80::/64 dev eth0 proto kernel metric 256 mtu 1300
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30
Fixes: 8e2ec639173f325 (ipv6: don't use inetpeer to store metrics for routes.)
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 11b1f8288d4341af5d755281c871bff6c3e270dd ]
We still need a validate_link_af() handler with an appropriate nla policy,
similarly as we have in IPv4 case, otherwise size validations are not being
done properly in that case.
Fixes: f53adae4eae5 ("net: ipv6: add tokenized interface identifier support")
Fixes: bc91b0f07ada ("ipv6: addrconf: implement address generation modes")
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6e9e16e6143b725662e47026a1d0f270721cdd24 ]
Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.
To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.
Fixes: 4a287eba2de3957 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2 ]
Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.
See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.
[1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00
Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f812116b174e59a350acc8e4856213a166a91222 ]
The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as
struct {
struct sock_extended_err ee;
struct sockaddr_in(6) offender;
} errhdr;
The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.
An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 2dc49d1680b534877fd20cce52557ea542bb06b6 ]
When xfrm6_policy_check() is used, _decode_session6() is called after some
intermediate functions. This function uses IP6CB(), thus TCP_SKB_CB() must be
prepared after the call of xfrm6_policy_check().
Before this patch, scenarii with IPv6 + TCP + IPsec Transport are broken.
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Reported-by: Huaibin Wang <huaibin.wang@6wind.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 0f85feae6b710ced3abad5b2b47d31dfcb956b62 ]
When I cooked commit c3658e8d0f1 ("tcp: fix possible NULL dereference in
tcp_vX_send_reset()") I missed other spots we could deref a NULL
skb_dst(skb)
Again, if a socket is provided, we do not need skb_dst() to get a
pointer to network namespace : sock_net(sk) is good enough.
Reported-by: Dann Frazier <dann.frazier@canonical.com>
Bisected-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After commit ca777eff51f7 ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.
If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191
Reported-by: Jaša Bartelj <jasa.bartelj@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The UDP checksum calculation for VXLAN tunnels is currently using the
socket addresses instead of the actual packet source and destination
addresses. As a result the checksum calculated is incorrect in some
cases.
Also uh->check was being set twice, first it was set to 0, and then it is
set again in udp6_set_csum. This change removes the redundant assignment
to 0.
Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.")
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using GRE redirection in WCCP, it sets the wrong skb->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Yuri Chislov <yuri.chislov@gmail.com>
Tested-by: Yuri Chislov <yuri.chislov@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now the vti_link_ops do not point the .dellink, for fb tunnel device
(ip_vti0), the net_device will be removed as the default .dellink is
unregister_netdevice_queue,but the tunnel still in the tunnel list,
then if we add a new vti tunnel, in ip_tunnel_find():
hlist_for_each_entry_rcu(t, head, hash_node) {
if (local == t->parms.iph.saddr &&
remote == t->parms.iph.daddr &&
link == t->parms.link &&
==> type == t->dev->type &&
ip_tunnel_key_match(&t->parms, flags, key))
break;
}
the panic will happen, cause dev of ip_tunnel *t is null:
[ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
[ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
[ 3835.073008] Oops: 0000 [#1] SMP
.....
[ 3835.073008] Stack:
[ 3835.073008] ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
[ 3835.073008] ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
[ 3835.073008] ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
[ 3835.073008] Call Trace:
[ 3835.073008] [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
[ 3835.073008] [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
[ 3835.073008] [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
[ 3835.073008] [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
[ 3835.073008] [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
[ 3835.073008] [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
[ 3835.073008] [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
[ 3835.073008] [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
[ 3835.073008] [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
[ 3835.073008] [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
[ 3835.073008] [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
....
modprobe ip_vti
ip link del ip_vti0 type vti
ip link add ip_vti0 type vti
rmmod ip_vti
do that one or more times, kernel will panic.
fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
which we skip the unregister of fb tunnel device. do the same on ip6_vti.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
the IPv6 GSO offloads. Without this change VXLAN tunnels running over IPv6
do not currently handle IPv4 TCP TSO requests correctly and end up handing
the non-segmented frame off to the device.
Below is the before and after for a simple netperf TCP_STREAM test between
two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
a 1Gb/s network adapter.
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.29 0.88 Before
87380 16384 16384 10.03 895.69 After
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
pim6_protocol was added when initiation, but it not deleted.
Similarly, unregister RTNL_FAMILY_IP6MR rtnetlink.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():
skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
<IRQ>
[<ffffffff80578226>] ? skb_put+0x3a/0x3b
[<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
[<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
[<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
[<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
[<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
[<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
[<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
[<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
[<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
[<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.
However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.
The IGMP case doesn't have this issue as commit 57e1ab6eaddc
("igmp: refine skb allocations") stores the allocation size in
the cb[].
Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter updates for your net tree,
they are:
1) Fix missing initialization of the range structure (allocated in the
stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.
2) Make sure the data we receive from userspace contains the req_version
structure, otherwise return an error incomplete on truncated input.
From Dan Carpenter.
3) Fix handling og skb->sk which may cause incorrect handling
of connections from a local process. Via Simon Horman, patch from
Calvin Owens.
4) Fix wrong netns in nft_compat when setting target and match params
structure.
5) Relax chain type validation in nft_compat that was recently included,
this broke the matches that need to be run from the route chain type.
Now iptables-test.py automated regression tests report success again
and we avoid the only possible problematic case, which is the use of
nat targets out of nat chain type.
6) Use match->table to validate the tablename, instead of the match->name.
Again patch for nft_compat.
7) Restore the synchronous release of objects from the commit and abort
path in nf_tables. This is causing two major problems: splats when using
nft_compat, given that matches and targets may sleep and call_rcu is
invoked from softirq context. Moreover Patrick reported possible event
notification reordering when rules refer to anonymous sets.
8) Fix race condition in between packets that are being confirmed by
conntrack and the ctnetlink flush operation. This happens since the
removal of the central spinlock. Thanks to Jesper D. Brouer to looking
into this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When transferring from the original range in nf_nat_masquerade_{ipv4,ipv6}()
we copy over values from stack in from min_proto/max_proto due to uninitialized
range variable in both, nft_masq_{ipv4,ipv6}_eval. As we only initialize
flags at this time from nft_masq struct, just zero out the rest.
Fixes: 9ba1f726bec09 ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|