| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is a forward-port of the original patch from Andrzej Hajda,
he said:
"IS_ERR_VALUE should be used only with unsigned long type.
Otherwise it can work incorrectly. To achieve this function
xt_percpu_counter_alloc is modified to return unsigned long,
and its result is assigned to temporary variable to perform
error checking, before assigning to .pcnt field.
The patch follows conclusion from discussion on LKML [1][2].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2120927
[2]: http://permalink.gmane.org/gmane.linux.kernel/2150581"
Original patch from Andrzej is here:
http://patchwork.ozlabs.org/patch/582970/
This patch has clashed with input validation fixes for x_tables.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
returns NULL
ip6_route_output() will never return a NULL pointer, so there's no need
to check it.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Tables have to exist for VRFs to function. Ensure they exist
when VRF device is created.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The ipv6 gre implementation was cleaned up to share more code
with the ipv4 version, but it can be enabled even when NET_IPGRE_DEMUX
is disabled, resulting in a link error:
net/built-in.o: In function `gre_rcv':
:(.text+0x17f5d0): undefined reference to `gre_parse_header'
ERROR: "gre_parse_header" [net/ipv6/ip6_gre.ko] undefined!
This adds a Kconfig dependency to prevent that now invalid
configuration.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions")
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's easier for gre_parse_header to return the header length instead of
filing it into a parameter. That way, the callers that don't care about the
header length can just check whether the returned value is lower than zero.
In gre_err, the tunnel header must not be pulled. See commit b7f8fe251e46
("gre: do not pull header in ICMP error processing") for details.
This patch reduces the conflict between the mentioned commit and commit
95f5c64c3c13 ("gre: Move utility functions to common headers").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
net/ipv4/ip_gre.c
Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR.
Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module")
CC: Tom Herbert <tom@herbertland.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
This is not a good practice and makes it hard to add new parameters.
This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
ipv4 and include the above mentioned variables. And we only pass the
pointer to this structure to corresponding functions. This makes it easier
to add new parameters in the future and makes the function cleaner.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Changes in GREv6 transmit path:
- Call gre_checksum, remove gre6_checksum
- Rename ip6gre_xmit2 to __gre6_xmit
- Call gre_build_header utility function
- Call ip6_tnl_xmit common function
- Call ip6_tnl_change_mtu, eliminate ip6gre_tunnel_change_mtu
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A few generic changes to generalize tunnels in IPv6:
- Export ip6_tnl_change_mtu so that it can be called by ip6_gre
- Add tun_hlen to ip6_tnl structure.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch renames ip6_tnl_xmit2 to ip6_tnl_xmit and exports it. Other
users like GRE will be able to call this. The original ip6_tnl_xmit
function is renamed to ip6_tnl_start_xmit (this is an ndo_start_xmit
function).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- Create gre_rcv function. This calls gre_parse_header and ip6gre_rcv.
- Call ip6_tnl_rcv. Doing this and using gre_parse_header eliminates
most of the code in ip6gre_rcv.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Some basic changes to make IPv6 tunnel receive path look more like
IPv4 path:
- Make ip6_tnl_rcv non-static so that GREv6 and others can call it
- Make ip6_tnl_rcv look like ip_tunnel_rcv
- Switch to gro_cells_receive
- Make ip6_tnl_rcv non-static and export it
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
UDP uses the generic socket backlog code, and this will soon
be changed to not disable BH when protocol is called back.
We need to use appropriate SNMP accessors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We want to to make TCP stack preemptible, as draining prequeue
and backlog queues can take lot of time.
Many SNMP updates were assuming that BH (and preemption) was disabled.
Need to convert some __NET_INC_STATS() calls to NET_INC_STATS()
and some __TCP_INC_STATS() to TCP_INC_STATS()
Before using this_cpu_ptr(net->ipv4.tcp_sk) in tcp_v4_send_reset()
and tcp_v4_send_ack(), we add an explicit preempt disabled section.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
IPv6 ICMP stats are atomics anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Rename IP6_UPD_PO_STATS_BH() to __IP6_UPD_PO_STATS()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Rename IP6_INC_STATS_BH() to __IP6_INC_STATS()
and IP6_ADD_STATS_BH() to __IP6_ADD_STATS()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Rename NET_INC_STATS_BH() to __NET_INC_STATS()
and NET_ADD_STATS_BH() to __NET_ADD_STATS()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Rename TCP_INC_STATS_BH() to __TCP_INC_STATS()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(),
and UDP6_INC_STATS_BH() to __UDP6_INC_STATS()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In the old days (before linux-3.0), SNMP counters were duplicated,
one for user context, and one for BH context.
After commit 8f0ea0fe3a03 ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.
We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()
Following patches will rename __BH helpers to make clear their
usage is not tied to BH being disabled.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Minor overlapping changes in the conflicts.
In the macsec case, the change of the default ID macro
name overlapped with the 64-bit netlink attribute alignment
fixes in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It was a simple idea -- save IPv6 configured addresses on a link down
so that IPv6 behaves similar to IPv4. As always the devil is in the
details and the IPv6 stack as too many behavioral differences from IPv4
making the simple idea more complicated than it needs to be.
The current implementation for keeping IPv6 addresses can panic or spit
out a warning in one of many paths:
1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in
rt6_fill_node while handling a route dump request.
2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in
fib6_del
3. Panic in fib6_purge_rt because rt6i_ref count is not 1.
The root cause of all these is references related to the host route for
an address that is retained.
So, this patch deletes the host route every time the ifdown loop runs.
Since the host route is deleted and will be re-generated an up there is
no longer a need for the l3mdev fix up. On the 'admin up' side move
addrconf_permanent_addr into the NETDEV_UP event handling so that it
runs only once versus on UP and CHANGE events.
All of the current panics and warnings appear to be related to
addresses on the loopback device, but given the catastrophic nature when
a bug is triggered this patch takes the conservative approach and evicts
all host routes rather than trying to determine when it can be re-used
and when it can not. That can be a later optimizaton if desired.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This reverts commit 841645b5f2dfceac69b78fcd0c9050868d41ea61.
Ok, this puts the feature back. I've decided to apply David A.'s
bug fix and run with that rather than make everyone wait another
whole release for this feature.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This reverts the following three commits:
70af921db6f8835f4b11c65731116560adb00c14
799977d9aafbf0ca0b9c39b04cbfb16db71302c9
f1705ec197e705b79ea40fe7a2cc5acfa1d3bfac
The feature was ill conceived, has terrible semantics, and has added
nothing but regressions to the already fragile ipv6 stack.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Similar to 3bfd847203c6 ("net: Use passed in table for nexthop lookups")
for IPv4, if the route spec contains a table id use that to lookup the
next hop first and fall back to a full lookup if it fails (per the fix
4c9bcd117918b ("net: Fix nexthop lookups")).
Example:
root@kenny:~# ip -6 ro ls table red
local 2100:1::1 dev lo proto none metric 0 pref medium
2100:1::/120 dev eth1 proto kernel metric 256 pref medium
local 2100:2::1 dev lo proto none metric 0 pref medium
2100:2::/120 dev eth2 proto kernel metric 256 pref medium
local fe80::e0:f9ff:fe09:3cac dev lo proto none metric 0 pref medium
local fe80::e0:f9ff:fe1c:b974 dev lo proto none metric 0 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
fe80::/64 dev eth2 proto kernel metric 256 pref medium
ff00::/8 dev red metric 256 pref medium
ff00::/8 dev eth1 metric 256 pref medium
ff00::/8 dev eth2 metric 256 pref medium
unreachable default dev lo metric 240 error -113 pref medium
root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
RTNETLINK answers: No route to host
Route add fails even though 2100:1::64 is a reachable next hop:
root@kenny:~# ping6 -I red 2100:1::64
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::64(2100:1::64) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::64: icmp_seq=1 ttl=64 time=1.33 ms
With this patch:
root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
root@kenny:~# ip -6 ro ls table red
local 2100:1::1 dev lo proto none metric 0 pref medium
2100:1::/120 dev eth1 proto kernel metric 256 pref medium
local 2100:2::1 dev lo proto none metric 0 pref medium
2100:2::/120 dev eth2 proto kernel metric 256 pref medium
2100:3::/64 via 2100:1::64 dev eth1 metric 1024 pref medium
local fe80::e0:f9ff:fe09:3cac dev lo proto none metric 0 pref medium
local fe80::e0:f9ff:fe1c:b974 dev lo proto none metric 0 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
fe80::/64 dev eth2 proto kernel metric 256 pref medium
ff00::/8 dev red metric 256 pref medium
ff00::/8 dev eth1 metric 256 pref medium
ff00::/8 dev eth2 metric 256 pref medium
unreachable default dev lo metric 240 error -113 pref medium
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Support checksum neutral ILA as described in the ILA draft. The low
order 16 bits of the identifier are used to contain the checksum
adjustment value.
The csum-mode parameter is added to described checksum processing. There
are three values:
- adjust transport checksum (previous behavior)
- do checksum neutral mapping
- do nothing
On output the csum-mode in the ila_params is checked and acted on. If
mode is checksum neutral mapping then to mapping and set C-bit.
On input, C-bit is checked. If it is set checksum-netural mapping is
done (regardless of csum-mode in ila params) and C-bit will be cleared.
If it is not set then action in csum-mode is taken.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Change model of xlat to be used only for input where lookup is done on
the locator part of an address (comparing to locator_match as key
in rhashtable). This is needed for checksum neutral translation
which obfuscates the low order 16 bits of the identifier. It also
permits hosts to be in muliple ILA domains (each locator can map
to a different SIR address). A check is also added to disallow
translating non-ILA addresses (check of type in identifier).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add structures for identifiers, locators, and an ila address which
is composed of a locator and identifier and in6_addr can be cast to
it. This includes a three bit type field and enums for the types defined
in ILA I-D.
In ILA lwt don't allow user to set a translation for a non-ILA
address (type of identifier is zero meaning it is an IID). This also
requires that the destination prefix is at least 65 bytes (64
bit locator and first byte of identifier).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We should call consume_skb(skb) when skb is properly consumed,
or kfree_skb(skb) when skb must be dropped in error case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/
|/|
| |
| |
| | |
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, mostly from Florian Westphal to sort out the lack of sufficient
validation in x_tables and connlabel preparation patches to add
nf_tables support. They are:
1) Ensure we don't go over the ruleset blob boundaries in
mark_source_chains().
2) Validate that target jumps land on an existing xt_entry. This extra
sanitization comes with a performance penalty when loading the ruleset.
3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables.
4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables.
5) Make sure the minimal possible target size in x_tables.
6) Similar to #3, add xt_compat_check_entry_offsets() for compat code.
7) Check that standard target size is valid.
8) More sanitization to ensure that the target_offset field is correct.
9) Add xt_check_entry_match() to validate that matches are well-formed.
10-12) Three patch to reduce the number of parameters in
translate_compat_table() for {arp,ip,ip6}tables by using a container
structure.
13) No need to return value from xt_compat_match_from_user(), so make
it void.
14) Consolidate translate_table() so it can be used by compat code too.
15) Remove obsolete check for compat code, so we keep consistent with
what was already removed in the native layout code (back in 2007).
16) Get rid of target jump validation from mark_source_chains(),
obsoleted by #2.
17) Introduce xt_copy_counters_from_user() to consolidate counter
copying, and use it from {arp,ip,ip6}tables.
18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump
functions.
19) Move nf_connlabel_match() to xt_connlabel.
20) Skip event notification if connlabel did not change.
21) Update of nf_connlabels_get() to make the upcoming nft connlabel
support easier.
23) Remove spinlock to read protocol state field in conntrack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The three variants use same copy&pasted code, condense this into a
helper and use that.
Make sure info.name is 0-terminated.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since 'netfilter: x_tables: validate targets of jumps' change we
validate that the target aligns exactly with beginning of a rule,
so offset test is now redundant.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
commit 9e67d5a739327c44885adebb4f3a538050be73e4
("[NETFILTER]: x_tables: remove obsolete overflow check") left the
compat parts alone, but we can kill it there as well.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This looks like refactoring, but its also a bug fix.
Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
sanity tests that are done in the normal path.
For example, we do not check for underflows and the base chain policies.
While its possible to also add such checks to the compat path, its more
copy&pastry, for instance we cannot reuse check_underflow() helper as
e->target_offset differs in the compat case.
Other problem is that it makes auditing for validation errors harder; two
places need to be checked and kept in sync.
At a high level 32 bit compat works like this:
1- initial pass over blob:
validate match/entry offsets, bounds checking
lookup all matches and targets
do bookkeeping wrt. size delta of 32/64bit structures
assign match/target.u.kernel pointer (points at kernel
implementation, needed to access ->compatsize etc.)
2- allocate memory according to the total bookkeeping size to
contain the translated ruleset
3- second pass over original blob:
for each entry, copy the 32bit representation to the newly allocated
memory. This also does any special match translations (e.g.
adjust 32bit to 64bit longs, etc).
4- check if ruleset is free of loops (chase all jumps)
5-first pass over translated blob:
call the checkentry function of all matches and targets.
The alternative implemented by this patch is to drop steps 3&4 from the
compat process, the translation is changed into an intermediate step
rather than a full 1:1 translate_table replacement.
In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
representation, i.e. put() the kernel pointer and restore ->u.user.name .
This gets us a 64bit ruleset that is in the format generated by a 64bit
iptables userspace -- we can then use translate_table() to get the
'native' sanity checks.
This has two drawbacks:
1. we re-validate all the match and target entry structure sizes even
though compat translation is supposed to never generate bogus offsets.
2. we put and then re-lookup each match and target.
THe upside is that we get all sanity tests and ruleset validations
provided by the normal path and can remove some duplicated compat code.
iptables-restore time of autogenerated ruleset with 300k chains of form
-A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
-A CHAIN0002 -m limit --limit 1/s -j CHAIN0003
shows no noticeable differences in restore times:
old: 0m30.796s
new: 0m31.521s
64bit: 0m25.674s
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Always returned 0.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We're currently asserting that targetoff + targetsize <= nextoff.
Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.
We also need the e->elems pointer in a followup change to validate matches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Once we add more sanity testing to xt_check_entry_offsets it
becomes relvant if we're expecting a 32bit 'config_compat' blob
or a normal one.
Since we already have a lot of similar-named functions (check_entry,
compat_check_entry, find_and_check_entry, etc.) and the current
incarnation is short just fold its contents into the callers.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.
Unfortunately these checks are not sufficient.
To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When we see a jump also check that the offset gets us to beginning of
a rule (an ipt_entry).
The extra overhead is negible, even with absurd cases.
300k custom rules, 300k jumps to 'next' user chain:
[ plus one jump from INPUT to first userchain ]:
Before:
real 0m24.874s
user 0m7.532s
sys 0m16.076s
After:
real 0m27.464s
user 0m7.436s
sys 0m18.840s
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Ben Hawkes says:
In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.
Base chains enforce absolute verdict.
User defined chains are supposed to end with an unconditional return,
xtables userspace adds them automatically.
But if such return is missing we will move to non-existent next rule.
Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|\ \ \
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts were two cases of simple overlapping changes,
nothing serious.
In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a release_cb for UDPv6. It does a route lookup
and updates sk->sk_dst_cache if it is needed. It picks up the
left-over job from ip6_sk_update_pmtu() if the sk was owned
by user during the pmtu update.
It takes a rcu_read_lock to protect the __sk_dst_get() operations
because another thread may do ip6_dst_store() without taking the
sk lock (e.g. sendmsg).
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
This patch updates the sk->sk_dst_cache for a connected datagram sk
during pmtu-update code path.
Note that the sk->sk_v6_daddr is used to do the route lookup
instead of skb->data (i.e. iph). It is because a UDP socket can become
connected after sending out some datagrams in un-connected state. or
It can be connected multiple times to different destinations. Hence,
iph may not be related to where sk is currently connected to.
It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect() (i.e changing
the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.
For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk->sk_dst_cache.
Test:
Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0 proto kernel metric 256 pref medium
2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium
A simple python code to create a connected UDP socket:
import socket
import errno
HOST = '2fac::1'
PORT = 8080
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
try:
data = s.recv(1024)
except socket.error as se:
if se.errno == errno.EMSGSIZE:
pmtu = s.getsockopt(41, 24)
print("PMTU:%d" % pmtu)
break
s.close()
Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300
Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0 metric 0
cache expires 463sec mtu 1300 pref medium
Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message. Here is the scapy script I have
used:
>>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
>>> sendp(p, iface='qemubr0')
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch moves the route lookup and update codes for connected
datagram sk to a newly created function ip6_datagram_dst_update()
It will be reused during the pmtu update in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|