| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 77c1c03c5b8ef28e55bb0aff29b1e006037ca645 ]
We must call security_release_secctx to free the memory returned by
security_secid_to_secctx, otherwise memory may be leaked forever.
Fixes: ef493bd930ae ("netfilter: nfnetlink_queue: add security context information")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 83d90219a5df8d950855ce73229a97b63605c317 ]
The nf_ct_helper_hash table is protected by nf_ct_helper_mutex, while
nfct_helper operation is protected by nfnl_lock(NFNL_SUBSYS_CTHELPER).
So it's possible that one CPU is walking the nf_ct_helper_hash for
cthelper add/get/del, another cpu is doing nf_conntrack_helpers_unregister
at the same time. This is dangrous, and may cause use after free error.
Note, delete operation will flush all cthelpers added via nfnetlink, so
using rcu to do protect is not easy.
Now introduce a dummy list to record all the cthelpers added via
nfnetlink, then we can walk the dummy list instead of walking the
nf_ct_helper_hash. Also, keep nfnl_cthelper_dump_table unchanged, it
may be invoked without nfnl_lock(NFNL_SUBSYS_CTHELPER) held.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f83bf8da1135ca635aac8f062cad3f001fcf3a26 ]
We have memory leaks of nf_conntrack_helper & expect_policy.
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 2c422257550f123049552b39f7af6e3428a60f43 ]
We only allow runtime updates of expectation policies for timeout and
maximum number of expectations, otherwise reject the update.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c5504f724c86ee925e7ffb80aa342cfd57959b13 ]
Information about ipvs in different network namespace can be seen via procfs.
How to reproduce:
# ip netns add ns01
# ip netns add ns02
# ip netns exec ns01 ip a add dev lo 127.0.0.1/8
# ip netns exec ns02 ip a add dev lo 127.0.0.1/8
# ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80
# ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80
The ipvsadm displays information about its own network namespace only.
# ip netns exec ns01 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.1.1.1:80 wlc
# ip netns exec ns02 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.1.1.2:80 wlc
But I can see information about other network namespace via procfs.
# ip netns exec ns01 cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 0A010101:0050 wlc
TCP 0A010102:0050 wlc
# ip netns exec ns02 cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 0A010102:0050 wlc
Signed-off-by: KUWAZAWA Takuya <albatross0@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[This fix is only needed for v4.9 stable since v4.10+ does not have the issue]
A verdict of NF_STOLEN after NF_QUEUE will cause an incorrect return value
and a potential kernel panic via double free of skb's
This was broken by commit 7034b566a4e7 ("netfilter: fix nf_queue handling")
and subsequently fixed in v4.10 by commit c63cbc460419 ("netfilter:
use switch() to handle verdict cases from nf_hook_slow()"). However that
commit cannot be cleanly cherry-picked to v4.9
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3e38df136e453aa69eb4472108ebce2fb00b1ba6 ]
BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8
Read of size 8 by task nft/1607
When we've destroyed last valid expr, nft_expr_next() returns an invalid expr.
We must not dereference it unless it passes != nft_expr_last() check.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c2e756ff9e699865d294cdc112acfc36419cf5cc ]
Using smp_processor_id() causes splats with PREEMPT_RCU:
[19379.552780] BUG: using smp_processor_id() in preemptible [00000000] code: ping/32389
[19379.552793] caller is debug_smp_processor_id+0x17/0x19
[...]
[19379.552823] Call Trace:
[19379.552832] [<ffffffff81274e9e>] dump_stack+0x67/0x90
[19379.552837] [<ffffffff8129a4d4>] check_preemption_disabled+0xe5/0xf5
[19379.552842] [<ffffffff8129a4fb>] debug_smp_processor_id+0x17/0x19
[19379.552849] [<ffffffffa07c42dd>] nft_queue_eval+0x35/0x20c [nft_queue]
No need to disable preemption since we only fetch the numeric value, so
let's use raw_smp_processor_id() instead.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit e1bf1687740ce1a3598a1c5e452b852ff2190682 upstream.
This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1.
It was not a good idea. The custom hash table was a much better
fit for this purpose.
A fast lookup is not essential, in fact for most cases there is no lookup
at all because original tuple is not taken and can be used as-is.
What needs to be fast is insertion and deletion.
rhlist removal however requires a rhlist walk.
We can have thousands of entries in such a list if source port/addresses
are reused for multiple flows, if this happens removal requests are so
expensive that deletions of a few thousand flows can take several
seconds(!).
The advantages that we got from rhashtable are:
1) table auto-sizing
2) multiple locks
1) would be nice to have, but it is not essential as we have at
most one lookup per new flow, so even a million flows in the bysource
table are not a problem compared to current deletion cost.
2) is easy to add to custom hash table.
I tried to add hlist_node to rhlist to speed up rhltable_remove but this
isn't doable without changing semantics. rhltable_remove_fast will
check that the to-be-deleted object is part of the table and that
requires a list walk that we want to avoid.
Furthermore, using hlist_node increases size of struct rhlist_head, which
in turn increases nf_conn size.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821
Reported-by: Ivan Babrou <ibobrik@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 6e699867f84c0f358fed233fe6162173aca28e04 upstream.
successful insert into the bysource hash sets IPS_SRC_NAT_DONE status bit
so we can check that instead of presence of nat extension which requires
extra deref.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f169fd695b192dd7b23aff8e69d25a1bc881bbfa ]
After adding the following nft rule, then ping 224.0.0.1:
# nft add rule netdev t c pkttype host counter
The warning complain message will be printed out again and again:
WARNING: CPU: 0 PID: 10182 at net/netfilter/nft_meta.c:163 \
nft_meta_get_eval+0x3fe/0x460 [nft_meta]
[...]
Call Trace:
<IRQ>
dump_stack+0x85/0xc2
__warn+0xcb/0xf0
warn_slowpath_null+0x1d/0x20
nft_meta_get_eval+0x3fe/0x460 [nft_meta]
nft_do_chain+0xff/0x5e0 [nf_tables]
So we should deal with PACKET_LOOPBACK in netdev family too. For ipv4,
convert it to PACKET_BROADCAST/MULTICAST according to the destination
address's type; For ipv6, convert it to PACKET_MULTICAST directly.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4b86c459c7bee3acaf92f0e2b4c6ac803eaa1a58 ]
Commit 4dee62b1b9b4 ("netfilter: nf_ct_expect: nf_ct_expect_insert()
returns void") inadvertently changed the successful return value of
nf_ct_expect_related_report() from 0 to 1 due to
__nf_ct_expect_check() returning 1 on success. Prevent this
regression in the future by changing the return value of
__nf_ct_expect_check() to 0 on success.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit ae5c682113f9f94cc5e76f92cf041ee624c173ee ]
The helper->expect_class_max must be set to the total number of
expect_policy minus 1, since we will use the statement "if (class >
helper->expect_class_max)" to validate the CTA_EXPECT_CLASS attr in
ctnetlink_alloc_expect.
So for compatibility, set the helper->expect_class_max to the
NFCTH_POLICY_SET_NUM attr's value minus 1.
Also: it's invalid when the NFCTH_POLICY_SET_NUM attr's value is zero.
1. this will result "expect_policy = kzalloc(0, GFP_KERNEL);";
2. we cannot set the helper->expect_class_max to a proper value.
So if nla_get_be32(tb[NFCTH_POLICY_SET_NUM]) is zero, report -EINVAL to
the userspace.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3b7dabf029478bb80507a6c4500ca94132a2bc0b ]
Otherwise, another CPU may access the invalid pointer. For example:
CPU0 CPU1
- rcu_read_lock();
- pfunc = _hook_;
_hook_ = NULL; -
mod unload -
- pfunc(); // invalid, panic
- rcu_read_unlock();
So we must call synchronize_rcu() to wait the rcu reader to finish.
Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
by later nf_conntrack_helper_unregister, but I'm inclined to add a
explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
on such obscure assumptions is not a good idea.
Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
remove it too.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3ef0c7a730de0bae03d86c19570af764fa3c4445 upstream.
As we want to remove spin_unlock_wait() and replace it with explicit
spin_lock()/spin_unlock() calls, we can use this to simplify the
locking.
In addition:
- Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
- The new code avoids the backwards loop.
Only slightly tested, I did not manage to trigger calls to
nf_conntrack_all_lock().
V2: With improved comments, to clearly show how the barriers
pair.
Fixes: b16c29191dc8 ("netfilter: nf_conntrack: use safer way to lock all buckets")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 97772bcd56efa21d9d8976db6f205574ea602f51 upstream.
When doing initial conversion to rhashtable I replaced the bucket
walk with a single rhashtable_lookup_fast().
When moving to rhlist I failed to properly walk the list of identical
tuples, but that is what is needed for this to work correctly.
The table contains the original tuples, so the reply tuples are all
distinct.
We currently decide that mapping is (not) in range only based on the
first entry, but in case its not we need to try the reply tuple of the
next entry until we either find an in-range mapping or we checked
all the entries.
This bug makes nat core attempt collision resolution while it might be
able to use the mapping as-is.
Fixes: 870190a9ec90 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Reported-by: Jaco Kroon <jaco@uls.co.za>
Tested-by: Jaco Kroon <jaco@uls.co.za>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9c3f3794926a997b1cab6c42480ff300efa2d162 upstream.
If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.
But actually, most of the ct extends are built-in, so the problem listed
above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
and NF_CT_EXT_SYNPROXY.
For _EXT_NAT, the panic will not happen, since adding the nat extend and
unregistering the nat extend are located in the same file(nf_nat_core.c),
this means that after the nat module is removed, we cannot add the nat
extend too.
For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
synproxy extend unregister will be done by synproxy_core_exit. So after
nf_synproxy_core.ko is removed, we may still try to add the synproxy
extend, then kernel panic may happen.
I know it's very hard to reproduce this issue, but I can play a tricky
game to make it happen very easily :)
Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
# iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
Also note, in the userspace we only add a 20s' delay, then
reinject the syn packet to the kernel:
# iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
Step 3. Using "nc 2.2.2.2 1234" to connect the server.
Step 4. Now remove the nf_synproxy_core.ko quickly:
# iptables -F FORWARD
# rmmod ipt_SYNPROXY
# rmmod nf_synproxy_core
Step 5. After 20s' delay, the syn packet is reinjected to the kernel.
Now you will see the panic like this:
kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
Call Trace:
? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
init_conntrack+0x12b/0x600 [nf_conntrack]
nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
nf_reinject+0x104/0x270
nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
? nla_parse+0xa0/0x100
nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
[...]
One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
introduce nf_conntrack_synproxy.c and only do ct extend register and
unregister in it, similar to nf_conntrack_timeout.c.
But having such a obscure restriction of nf_ct_extend_unregister is not a
good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
it will be easier if we add new ct extend in the future.
Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
anymore, remove it too.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3c5ab3f395d66a9e4e937fcfdf6ebc63894f028b upstream.
We do not check if packet from real server is for NAT
connection before performing SNAT. This causes problems
for setups that use DR/TUN and allow local clients to
access the real server directly, for example:
- local client in director creates IPVS-DR/TUN connection
CIP->VIP and the request packets are routed to RIP.
Talks are finished but IPVS connection is not expired yet.
- second local client creates non-IPVS connection CIP->RIP
with same reply tuple RIP->CIP and when replies are received
on LOCAL_IN we wrongly assign them for the first client
connection because RIP->CIP matches the reply direction.
As result, IPVS SNATs replies for non-IPVS connections.
The problem is more visible to local UDP clients but in rare
cases it can happen also for TCP or remote clients when the
real server sends the reply traffic via the director.
So, better to be more precise for the reply traffic.
As replies are not expected for DR/TUN connections, better
to not touch them.
Reported-by: Nick Moriarty <nick.moriarty@york.ac.uk>
Tested-by: Nick Moriarty <nick.moriarty@york.ac.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 87e94dbc210a720a34be5c1174faee5c84be963e upstream.
This patch fixes the creation of connection tracking entry from
netlink when synproxy is used. It was missing the addition of
the synproxy extension.
This was causing kernel crashes when a conntrack entry created by
conntrackd was used after the switch of traffic from active node
to the passive node.
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 2638fd0f92d4397884fd991d8f4925cb3f081901 upstream.
Denys provided an awesome KASAN report pointing to an use
after free in xt_TCPMSS
I have provided three patches to fix this issue, either in xt_TCPMSS or
in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible
impact.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 5ce6b04ce96896e8a79e6f60740ced911eaac7a4 ]
First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127,
at nf_log_packet(), so the extra part is useless.
Second, after adding a log rule with a very very long prefix, we will
fail to dump the nft rules after this _special_ one, but acctually,
they do exist. For example:
# name_65000=$(printf "%0.sQ" {1..65000})
# nft add rule filter output log prefix "$name_65000"
# nft add rule filter output counter
# nft add rule filter output counter
# nft list chain filter output
table ip filter {
chain output {
type filter hook output priority 0; policy accept;
}
}
So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 35d0ac9070ef619e3bf44324375878a1c540387b ]
If the element exists and no NLM_F_EXCL is specified, do not bump
set->nelems, otherwise we leak one set element slot. This problem
amplifies if the set is full since the abort path always decrements the
counter for the -ENFILE case too, giving one spare extra slot.
Fix this by moving set->nelems update to nft_add_set_elem() after
successful element insertion. Moreover, remove the element if the set is
full so there is no need to rely on the abort path to undo things
anymore.
Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit da2f27e9e615d1c799c9582b15262458da61fddc upstream.
In commit 82de0be6862cd ("netfilter: Add helper array
register/unregister functions"),
struct nf_conntrack_helper sip[MAX_PORTS][4] was changed to
sip[MAX_PORTS * 4], so the memory init should have been changed to
memset(&sip[4 * i], 0, 4 * sizeof(sip[i]));
But as the sip[] table is allocated in the BSS, it is already set to 0
Fixes: 82de0be6862cd ("netfilter: Add helper array register/unregister functions")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d2df92e98a34a5619dadd29c6291113c009181e7 upstream.
The existing code selects no next branch to be inspected when
re-inserting an inactive element into the rb-tree, looping endlessly.
This patch restricts the check for active elements to the EEXIST case
only.
Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates")
Reported-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit e5072053b09642b8ff417d47da05b84720aea3ee upstream.
This further refines the changes made to conntrack gc_worker in
commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").
The main idea of that change was to reduce the scan interval when evictions
take place.
However, on the reporters' setup, there are 1-2 million conntrack entries
in total and roughly 8k new (and closing) connections per second.
In this case we'll always evict at least one entry per gc cycle and scan
interval is always at 1 jiffy because of this test:
} else if (expired_count) {
gc_work->next_gc_run /= 2U;
next_run = msecs_to_jiffies(1);
being true almost all the time.
Given we scan ~10k entries per run its clearly wrong to reduce interval
based on nonzero eviction count, it will only waste cpu cycles since a vast
majorities of conntracks are not timed out.
Thus only look at the ratio (scanned entries vs. evicted entries) to make
a decision on whether to reduce or not.
Because evictor is supposed to only kick in when system turns idle after
a busy period, pick a high ratio -- this makes it 50%. We thus keep
the idea of increasing scan rate when its likely that table contains many
expired entries.
In order to not let timed-out entries hang around for too long
(important when using event logging, in which case we want to timely
destroy events), we now scan the full table within at most
GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
timed-out entries sit in same slot.
I tested this with a vm under synflood (with
sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
While flood is ongoing, interval now stays at its max rate
(GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).
With feedback from Nicolas Dichtel.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 524b698db06b9b6da7192e749f637904e2f62d7b upstream.
Instead of breaking loop and instant resched, don't bother checking
this in first place (the loop calls cond_resched for every bucket anyway).
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit dfe75ff8ca74f54b0fa5a326a1aa9afa485ed802 upstream.
Commit 3bb398d925 ("netfilter: nf_ct_helper: disable automatic helper
assignment") is causing behavior regressions in firewalls, as traffic
handled by conntrack helpers is now by default not passed through even
though it was before due to missing CT targets (which were not necessary
before this commit).
The default had to be switched off due to security reasons [1] [2] and
therefore should stay the way it is, but let's be friendly to firewall
admins and issue a warning the first time we're in situation where packet
would be likely passed through with the old default but we're likely going
to drop it on the floor now.
Rewrite the code a little bit as suggested by Linus, so that we avoid
spaghettiing the code even more -- namely the whole decision making
process regarding helper selection (either automatic or not) is being
separated, so that the whole logic can be simplified and code (condition)
duplication reduced.
[1] https://cansecwest.com/csw12/conntrack-attack.pdf
[2] https://home.regit.org/netfilter-en/secure-use-of-helpers/
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
| |
Otherwise, kernel panic will happen if the user does not specify
the related attributes.
Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Liping Zhang reports, after commit a8b1e36d0d1d ("netfilter: nft_dynset:
fix element timeout for HZ != 1000"), priv->timeout was stored in jiffies,
while set->timeout was stored in milliseconds. This is inconsistent and
incorrect.
Firstly, we already call msecs_to_jiffies in nft_set_elem_init, so
priv->timeout will be converted to jiffies twice.
Secondly, if the user did not specify the NFTA_DYNSET_TIMEOUT attr,
set->timeout will be used, but we forget to call msecs_to_jiffies
when do update elements.
Fix this by using jiffies internally for traditional sets and doing the
conversions to/from msec when interacting with userspace - as dynset
already does.
This is preferable to doing the conversions, when elements are inserted or
updated, because this can happen very frequently on busy dynsets.
Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Reported-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
Acked-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I got offlist bug report about failing connections and high cpu usage.
This happens because we hit 'elasticity' checks in rhashtable that
refuses bucket list exceeding 16 entries.
The nat bysrc hash unfortunately needs to insert distinct objects that
share same key and are identical (have same source tuple), this cannot
be avoided.
Switch to the rhlist interface which is designed for this.
The nulls_base is removed here, I don't think its needed:
A (unlikely) false positive results in unneeded port clash resolution,
a false negative results in packet drop during conntrack confirmation,
when we try to insert the duplicate into main conntrack hash table.
Tested by adding multiple ip addresses to host, then adding
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
... and then creating multiple connections, from same source port but
different addresses:
for i in $(seq 2000 2032);do nc -p 1234 192.168.7.1 $i > /dev/null & done
(all of these then get hashed to same bysource slot)
Then, to test that nat conflict resultion is working:
nc -s 10.0.0.1 -p 1234 192.168.7.1 2000
nc -s 10.0.0.2 -p 1234 192.168.7.1 2000
tcp .. src=10.0.0.1 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1024 [ASSURED]
tcp .. src=10.0.0.2 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1025 [ASSURED]
tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1234 [ASSURED]
tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2001 src=192.168.7.1 dst=192.168.7.10 sport=2001 dport=1234 [ASSURED]
[..]
-> nat altered source ports to 1024 and 1025, respectively.
This can also be confirmed on destination host which shows
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1024
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1025
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1234
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The comparator works like memcmp, i.e. 0 means objects are equal.
In other words, when objects are distinct they are treated as identical,
when they are distinct they are allegedly the same.
The first case is rare (distinct objects are unlikely to get hashed to
same bucket).
The second case results in unneeded port conflict resolutions attempts.
Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the function nft_parse_u32_check() to fetch the value and validate
the u32 attribute into the hash len u8 field.
This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").
Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression")
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dalegaard says:
The following ruleset, when loaded with 'nft -f bad.txt'
----snip----
flush ruleset
table ip inlinenat {
map sourcemap {
type ipv4_addr : verdict;
}
chain postrouting {
ip saddr vmap @sourcemap accept
}
}
add chain inlinenat test
add element inlinenat sourcemap { 100.123.10.2 : jump test }
----snip----
results in a kernel oops:
BUG: unable to handle kernel paging request at 0000000000001344
IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
[...]
Call Trace:
[<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables]
[<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables]
[<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables]
[<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables]
[<ffffffff8132c630>] ? nla_strcmp+0x40/0x50
[<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
[<ffffffff8132c400>] ? nla_validate+0x60/0x80
[<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]
Because we forget to fill the net pointer in bind_ctx, so dereferencing
it may cause kernel crash.
Reported-by: Dalegaard <dalegaard@gmail.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nicolas Dichtel says:
After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to
remove timed-out entries"), netlink conntrack deletion events may be
sent with a huge delay.
Nicolas further points at this line:
goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
and indeed, this isn't optimal at all. Rationale here was to ensure that
we don't block other work items for too long, even if
nf_conntrack_htable_size is huge. But in order to have some guarantee
about maximum time period where a scan of the full conntrack table
completes we should always use a fixed slice size, so that once every
N scans the full table has been examined at least once.
We also need to balance this vs. the case where the system is either idle
(i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
from packet path).
So, after some discussion with Nicolas:
1. want hard guarantee that we scan entire table at least once every X s
-> need to scan fraction of table (get rid of upper bound)
2. don't want to eat cycles on idle or very busy system
-> increase interval if we did not evict any entries
3. don't want to block other worker items for too long
-> make fraction really small, and prefer small scan interval instead
4. Want reasonable short time where we detect timed-out entry when
system went idle after a burst of traffic, while not doing scans
all the time.
-> Store next gc scan in worker, increasing delays when no eviction
happened and shrinking delay when we see timed out entries.
The old gc interval is turned into a max number, scans can now happen
every jiffy if stale entries are present.
Longest possible time period until an entry is evicted is now 2 minutes
in worst case (entry expires right after it was deemed 'not expired').
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thomas reports its not possible to attach the H.245 helper:
iptables -t raw -A PREROUTING -p udp -j CT --helper H.245
iptables: No chain/target/match by that name.
xt_CT: No such helper "H.245"
This is because H.245 registers as NFPROTO_UNSPEC, but the CT target
passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get.
We should treat UNSPEC as wildcard and ignore the l3num instead.
Reported-by: Thomas Woerner <twoerner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The (percpu) untracked conntrack entries can end up with nonzero connmarks.
The 'untracked' conntrack objects are merely a way to distinguish INVALID
(i.e. protocol connection tracker says payload doesn't meet some
requirements or packet was never seen by the connection tracking code)
from packets that are intentionally not tracked (some icmpv6 types such as
neigh solicitation, or by using 'iptables -j CT --notrack' option).
Untracked conntrack objects are implementation detail, we might as well use
invalid magic address instead to tell INVALID and UNTRACKED apart.
Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.
Reported-by: XU Tianwen <evan.xu.tianwen@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
family.maxattr is the max index for policy[], the size of
ops[] is determined with ARRAY_SIZE().
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
When the memory is exhausted, then we will fail to add the NFT_MSG_NEWSET
transaction. In such case, we should destroy the set before we free it.
Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86
confuses the compiler to the point where it produces a rather
dubious warning message:
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
struct ip_vs_sync_conn_options opt;
^~~
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
The problem appears to be a combination of a number of factors, including
the __builtin_bswap32 compiler builtin being slightly odd, having a large
amount of code inlined into a single function, and the way that some
functions only get partially inlined here.
I've spent way too much time trying to work out a way to improve the
code, but the best I've come up with is to add an explicit memset
right before the ip_vs_seq structure is first initialized here. When
the compiler works correctly, this has absolutely no effect, but in the
case that produces the warning, the warning disappears.
In the process of analysing this warning, I also noticed that
we use memcpy to copy the larger ip_vs_sync_conn_options structure
over two members of the ip_vs_conn structure. This works because
the layout is identical, but seems error-prone, so I'm changing
this in the process to directly copy the two members. This change
seemed to have no effect on the object code or the warning, but
it deals with the same data, so I kept the two changes together.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nft_parse_u32_check
Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
u32 netlink attributes") introduced nft_parse_u32_check with a return
value of "unsigned int", yet on error it returns "-ERANGE".
This patch corrects the mismatch by changing the return value to "int",
which happens to match the actual users of nft_parse_u32_check already.
Found by Coverity, CID 1373930.
Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
handling in nft_exthdr_init()) attempted to address the issue, but
did not address the return type of nft_parse_u32_check.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Laura Garcia Liebana <nevola@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on SIP requests, so a fragmented TCP SIP packet from an allow header starting with
INVITE,NOTIFY,OPTIONS,REFER,REGISTER,UPDATE,SUBSCRIBE
Content-Length: 0
will not bet interpreted as an INVITE request. Also Request-URI must start with an alphabetic character.
Confirm with RFC 3261
Request-Line = Method SP Request-URI SP SIP-Version CRLF
Fixes: 30f33e6dee80 ("[NETFILTER]: nf_conntrack_sip: support method specific request/response handling")
Signed-off-by: Ulrich Weber <ulrich.weber@riverbed.com>
Acked-by: Marco Angaroni <marcoangaroni@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Packets may race when create the new element in nft_hash_update:
CPU0 CPU1
lookup_fast - fail lookup_fast - fail
new - ok new - ok
insert - ok insert - fail(EEXIST)
So when race happened, we reuse the existing element. Otherwise,
these *racing* packets will not be handled properly.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When nft_expr_clone failed, a series of problems will happen:
1. module refcnt will leak, we call __module_get at the beginning but
we forget to put it back if ops->clone returns fail
2. memory will be leaked, if clone fail, we just return NULL and forget
to free the alloced element
3. set->nelems will become incorrect when set->size is specified. If
clone fail, we should decrease the set->nelems
Now this patch fixes these problems. And fortunately, clone fail will
only happen on counter expression when memory is exhausted.
Fixes: 086f332167d6 ("netfilter: nf_tables: add clone interface to expression operations")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When CONFIG_NFT_SET_HASH is not enabled and I input the following rule:
"nft add rule filter output flow table test {ip daddr counter }", kernel
panic happened on my system:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Call Trace:
[<ffffffffa0590466>] ? nft_dynset_eval+0x56/0x100 [nf_tables]
[<ffffffffa05851bb>] nft_do_chain+0xfb/0x4e0 [nf_tables]
[<ffffffffa0432f01>] ? nf_conntrack_tuple_taken+0x61/0x210 [nf_conntrack]
[<ffffffffa0459ea6>] ? get_unique_tuple+0x136/0x560 [nf_nat]
[<ffffffffa043bca1>] ? __nf_ct_ext_add_length+0x111/0x130 [nf_conntrack]
[<ffffffffa045a357>] ? nf_nat_setup_info+0x87/0x3b0 [nf_nat]
[<ffffffff81761e27>] ? ipt_do_table+0x327/0x610
[<ffffffffa045a6d7>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
[<ffffffffa059f21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
[<ffffffff81702515>] nf_iterate+0x55/0x60
[<ffffffff81702593>] nf_hook_slow+0x73/0xd0
Because in rbtree type set, ops->update is not implemented. So just keep
it simple, in such case, report -EOPNOTSUPP to the user space.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace
list_head with single linked list") for two reasons:
1) If the bypass flag is set on, there are no userspace listeners and
we still have more hook entries to iterate over, then jump to the
next hook. Otherwise accept the packet. On nf_reinject() path, the
okfn() needs to be invoked.
2) We should not re-enter the same hook on packet reinjection. If the
packet is accepted, we have to skip the current hook from where the
packet was enqueued, otherwise the packets gets enqueued over and
over again.
This restores the previous list_for_each_entry_continue() behaviour
happening from nf_iterate() that was dealing with these two cases.
This patch introduces a new nf_queue() wrapper function so this fix
becomes simpler.
Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
When the maximum evictions number is reached, do not wait 5 seconds before
the next run.
CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Markus Trippelsdorf reports:
WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480)
4055601e0088ffff000000000000000090686d81ffffffff0000000000000000
u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
^
|RIP: 0010:[<ffffffff8166e561>] [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160
[..]
[<ffffffff8166e561>] nf_register_net_hook+0x51/0x160
[<ffffffff8166eaaf>] nf_register_net_hooks+0x3f/0xa0
[<ffffffff816d6715>] ipt_register_table+0xe5/0x110
[..]
This warning is harmless; we copy 'uninitialized' data from the hook ops
but it will not be used.
Long term the structures keeping run-time data should be disentangled
from those only containing config-time data (such as where in the list
to insert a hook), but thats -next material.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The newly added nft_range_eval() function handles the two possible
nft range operations, but as the compiler warning points out,
any unexpected value would lead to the 'mismatch' variable being
used without being initialized:
net/netfilter/nft_range.c: In function 'nft_range_eval':
net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This removes the variable in question and instead moves the
condition into the switch itself, which is potentially more
efficient than adding a bogus 'default' clause as in my
first approach, and is nicer than using the 'uninitialized_var'
macro.
Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Link: http://patchwork.ozlabs.org/patch/677114/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Use nft_parse_u32_check() to make sure we don't get a value over the
unsigned 8-bit integer. Moreover, make sure this value doesn't go over
the two supported range comparison modes.
Fixes: 9286c2eb1fda ("netfilter: nft_range: validate operation netlink attribute")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
"err" needs to be signed for the error handling to work.
Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|