summaryrefslogtreecommitdiffstats
path: root/net/bridge
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2019-09-152-1/+5
|\ | | | | | | | | | | Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>
| * bridge/mdb: remove wrong use of NLM_F_MULTINicolas Dichtel2019-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end. In fact, NLMSG_DONE is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 949f1e39a617 ("bridge: mdb: notify on router port add and del") CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: bridge: Drops IPv6 packets if IPv6 module is not loadedLeonardo Bras2019-09-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | A kernel panic can happen if a host has disabled IPv6 on boot and have to process guest packets (coming from a bridge) using it's ip6tables. IPv6 packets need to be dropped if the IPv6 module is not loaded, and the host ip6tables will be used. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: move nf_bridge_frag_data struct definition to a more appropriate ↵Jeremy Sowden2019-09-131-7/+7
| | | | | | | | | | | | | | | | | | | | | | header. There is a struct definition function in nf_conntrack_bridge.h which is not specific to conntrack and is used elswhere in netfilter. Move it into netfilter_bridge.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: update include directives.Jeremy Sowden2019-09-131-1/+0
| | | | | | | | | | | | | | | | Include some headers in files which require them, and remove others which are not required. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: inline xt_hashlimit, ebt_802_3 and xt_physdev headersJeremy Sowden2019-09-131-1/+7
| | | | | | | | | | | | | | | | Three netfilter headers are only included once. Inline their contents at those sites and remove them. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2019-09-021-1/+1
|\| | | | | | | | | | | | | r8152 conflicts are the NAPI fixes in 'net' overlapping with some tasklet stuff in net-next Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorderwenxu2019-08-301-1/+1
| | | | | | | | | | | | | | | | | | Get the vlan_proto of ingress bridge in network byteorder as userspace expects. Otherwise this is inconsistent with NFT_META_PROTOCOL. Fixes: 2a3a93ef0ba5 ("netfilter: nft_meta_bridge: Add NFT_META_BRI_IIFVPROTO support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | net: bridge: Populate the pvid flag in br_vlan_get_infoVladimir Oltean2019-08-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently this simplified code snippet fails: br_vlan_get_pvid(netdev, &pvid); br_vlan_get_info(netdev, pvid, &vinfo); ASSERT(!(vinfo.flags & BRIDGE_VLAN_INFO_PVID)); It is intuitive that the pvid of a netdevice should have the BRIDGE_VLAN_INFO_PVID flag set. However I can't seem to pinpoint a commit where this behavior was introduced. It seems like it's been like that since forever. At a first glance it would make more sense to just handle the BRIDGE_VLAN_INFO_PVID flag in __vlan_add_flags. However, as Nikolay explains: There are a few reasons why we don't do it, most importantly because we need to have only one visible pvid at any single time, even if it's stale - it must be just one. Right now that rule will not be violated by this change, but people will try using this flag and could see two pvids simultaneously. You can see that the pvid code is even using memory barriers to propagate the new value faster and everywhere the pvid is read only once. That is the reason the flag is set dynamically when dumping entries, too. A second (weaker) argument against would be given the above we don't want another way to do the same thing, specifically if it can provide us with two pvids (e.g. if walking the vlan list) or if it can provide us with a pvid different from the one set in the vg. [Obviously, I'm talking about RCU pvid/vlan use cases similar to the dumps. The locked cases are fine. I would like to avoid explaining why this shouldn't be relied upon without locking] So instead of introducing the above change and making sure of the pvid uniqueness under RCU, simply dynamically populate the pvid flag in br_vlan_get_info(). Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2019-08-271-4/+4
|\| | | | | | | | | | | | | Minor conflict in r8169, bug fix had two versions in net and net-next, take the net-next hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: ebtables: Fix argument order to ADD_COUNTERTodd Seidelmann2019-08-191-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | The ordering of arguments to the x_tables ADD_COUNTER macro appears to be wrong in ebtables (cf. ip_tables.c, ip6_tables.c, and arp_tables.c). This causes data corruption in the ebtables userspace tools because they get incorrect packet & byte counts from the kernel. Fixes: d72133e628803 ("netfilter: ebtables: use ADD_COUNTER macro") Signed-off-by: Todd Seidelmann <tseidelmann@linode.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | net: bridge: mdb: allow add/delete for host-joined groupsNikolay Aleksandrov2019-08-173-30/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently this is needed only for user-space compatibility, so similar object adds/deletes as the dumped ones would succeed. Later it can be used for L2 mcast MAC add/delete. v3: fix compiler warning (DaveM) v2: don't send a notification when used from user-space, arm the group timer if no ports are left after host entry del Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: mdb: dump host-joined entries as wellNikolay Aleksandrov2019-08-171-10/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we dump only the port mdb entries but we can have host-joined entries on the bridge itself and they should be treated as normal temp mdbs, they're already notified: $ bridge monitor all [MDB]dev br0 port br0 grp ff02::8 temp The group will not be shown in the bridge mdb output, but it takes 1 slot and it's timing out. If it's only host-joined then the mdb show output can even be empty. After this patch we show the host-joined groups: $ bridge mdb show dev br0 port br0 grp ff02::8 temp Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: mdb: factor out mdb fillingNikolay Aleksandrov2019-08-171-31/+37
| | | | | | | | | | | | | | | | We have to factor out the mdb fill portion in order to re-use it later for the bridge mdb entries. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: mdb: move vlan commentsNikolay Aleksandrov2019-08-171-6/+6
| | | | | | | | | | | | | | | | Trivial patch to move the vlan comments in their proper places above the vid 0 checks. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2019-08-066-38/+50
|\| | | | | | | | | | | Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTERNikolay Aleksandrov2019-08-053-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the bridge device's vlan init bugs come from the fact that its default pvid is created at the wrong time, way too early in ndo_init() before the device is even assigned an ifindex. It introduces a bug when the bridge's dev_addr is added as fdb during the initial default pvid creation the notification has ifindex/NDA_MASTER both equal to 0 (see example below) which really makes no sense for user-space[0] and is wrong. Usually user-space software would ignore such entries, but they are actually valid and will eventually have all necessary attributes. It makes much more sense to send a notification *after* the device has registered and has a proper ifindex allocated rather than before when there's a chance that the registration might still fail or to receive it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush from br_vlan_flush() since that case can no longer happen. At NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything depending why it was called (if called due to NETDEV_REGISTER error it'll still be == 1, otherwise it could be any value changed during the device life time). For the demonstration below a small change to iproute2 for printing all fdb notifications is added, because it contained a workaround not to show entries with ifindex == 0. Command executed while monitoring: $ ip l add br0 type bridge Before (both ifindex and master == 0): $ bridge monitor fdb 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent After (proper br0 ifindex): $ bridge monitor fdb e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER v3: send the correct v2 patch with all changes (stub should return 0) v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in the br_vlan_bridge_event stub when bridge vlans are disabled [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389 Reported-by: michael-dev <michael-dev@fami-braun.de> Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: bridge: mcast: don't delete permanent entries when fast leave is enabledNikolay Aleksandrov2019-07-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When permanent entries were introduced by the commit below, they were exempt from timing out and thus igmp leave wouldn't affect them unless fast leave was enabled on the port which was added before permanent entries existed. It shouldn't matter if fast leave is enabled or not if the user added a permanent entry it shouldn't be deleted on igmp leave. Before: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show $ After: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent Fixes: ccb1c31a7a87 ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2019-07-312-20/+22
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) memleak in ebtables from the error path for the 32/64 compat layer, from Florian Westphal. 2) Fix inverted meta ifname/ifidx matching when no interface is set on either from the input/output path, from Phil Sutter. 3) Remove goto label in nft_meta_bridge, also from Phil. 4) Missing include guard in xt_connlabel, from Masahiro Yamada. 5) Two patch to fix ipset destination MAC matching coming from Stephano Brivio, via Jozsef Kadlecsik. 6) Fix set rename and listing concurrency problem, from Shijie Luo. Patch also coming via Jozsef Kadlecsik. 7) ebtables 32/64 compat missing base chain policy in rule count, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * netfilter: ebtables: also count base chain policiesFlorian Westphal2019-07-301-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ebtables doesn't include the base chain policies in the rule count, so we need to add them manually when we call into the x_tables core to allocate space for the comapt offset table. This lead syzbot to trigger: WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649 xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649 Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com Fixes: 2035f3ff8eaa ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: nft_meta_bridge: Eliminate 'out' labelPhil Sutter2019-07-251-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | The label is used just once and the code it points at is not reused, no point in keeping it. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: nf_tables: Make nft_meta expression more robustPhil Sutter2019-07-251-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in situations where required data is missing leads to unexpected behaviour with inverted checks like so: | meta iifname != eth0 accept This rule will never match if there is no input interface (or it is not known) which is not intuitive and, what's worse, breaks consistency of iptables-nft with iptables-legacy. Fix this by falling back to placing a value in dreg which never matches (avoiding accidental matches), i.e. zero for interface index and an empty string for interface name. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * netfilter: ebtables: fix a memory leak bug in compatWenwen Wang2019-07-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In compat_do_replace(), a temporary buffer is allocated through vmalloc() to hold entries copied from the user space. The buffer address is firstly saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then the entries in this temporary buffer is copied to the internal kernel structure through compat_copy_entries(). If this copy process fails, compat_do_replace() should be terminated. However, the allocated temporary buffer is not freed on this path, leading to a memory leak. To fix the bug, free the buffer before returning from compat_do_replace(). Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | net: bridge: delete local fdb on device init failureNikolay Aleksandrov2019-07-291-0/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On initialization failure we have to delete the local fdb which was inserted due to the default pvid creation. This problem has been present since the inception of default_pvid. Note that currently there are 2 cases: 1) in br_dev_init() when br_multicast_init() fails 2) if register_netdevice() fails after calling ndo_init() This patch takes care of both since br_vlan_flush() is called on both occasions. Also the new fdb delete would be a no-op on normal bridge device destruction since the local fdb would've been already flushed by br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is called last when adding a port thus nothing can fail after it. Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* / net: bridge: mcast: add delete due to fast-leave mdb flagNikolay Aleksandrov2019-07-313-1/+4
|/ | | | | | | | | | | | | In user-space there's no way to distinguish why an mdb entry was deleted and that is a problem for daemons which would like to keep the mdb in sync with remote ends (e.g. mlag) but would also like to converge faster. In almost all cases we'd like to age-out the remote entry for performance and convergence reasons except when fast-leave is enabled. In that case we want explicit immediate remote delete, thus add mdb flag which is set only when the entry is being deleted due to fast-leave. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: bridge: make NF_TABLES_BRIDGE tristateArnd Bergmann2019-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The new nft_meta_bridge code fails to link as built-in when NF_TABLES is a loadable module. net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_eval': nft_meta_bridge.c:(.text+0x1e8): undefined reference to `nft_meta_get_eval' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_init': nft_meta_bridge.c:(.text+0x468): undefined reference to `nft_meta_get_init' nft_meta_bridge.c:(.text+0x49c): undefined reference to `nft_parse_register' nft_meta_bridge.c:(.text+0x4cc): undefined reference to `nft_validate_register_store' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_exit': nft_meta_bridge.c:(.exit.text+0x14): undefined reference to `nft_unregister_expr' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_init': nft_meta_bridge.c:(.init.text+0x14): undefined reference to `nft_register_expr' net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x60): undefined reference to `nft_meta_get_dump' net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x88): undefined reference to `nft_meta_set_eval' This can happen because the NF_TABLES_BRIDGE dependency itself is just a 'bool'. Make the symbol a 'tristate' instead so Kconfig can propagate the dependencies correctly. Fixes: 30e103fe24de ("netfilter: nft_meta: move bridge meta keys into nft_meta_bridge") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: bridge: NF_CONNTRACK_BRIDGE does not depend on NF_TABLES_BRIDGEPablo Neira Ayuso2019-07-181-2/+2
| | | | | | | Place NF_CONNTRACK_BRIDGE away from the NF_TABLES_BRIDGE dependency. Fixes: 3c171f496ef5 ("netfilter: bridge: add connection tracking system") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2019-07-1115-112/+832
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Some highlights from this development cycle: 1) Big refactoring of ipv6 route and neigh handling to support nexthop objects configurable as units from userspace. From David Ahern. 2) Convert explored_states in BPF verifier into a hash table, significantly decreased state held for programs with bpf2bpf calls, from Alexei Starovoitov. 3) Implement bpf_send_signal() helper, from Yonghong Song. 4) Various classifier enhancements to mvpp2 driver, from Maxime Chevallier. 5) Add aRFS support to hns3 driver, from Jian Shen. 6) Fix use after free in inet frags by allocating fqdirs dynamically and reworking how rhashtable dismantle occurs, from Eric Dumazet. 7) Add act_ctinfo packet classifier action, from Kevin Darbyshire-Bryant. 8) Add TFO key backup infrastructure, from Jason Baron. 9) Remove several old and unused ISDN drivers, from Arnd Bergmann. 10) Add devlink notifications for flash update status to mlxsw driver, from Jiri Pirko. 11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski. 12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes. 13) Various enhancements to ipv6 flow label handling, from Eric Dumazet and Willem de Bruijn. 14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van der Merwe, and others. 15) Various improvements to axienet driver including converting it to phylink, from Robert Hancock. 16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean. 17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana Radulescu. 18) Add devlink health reporting to mlx5, from Moshe Shemesh. 19) Convert stmmac over to phylink, from Jose Abreu. 20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from Shalom Toledo. 21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera. 22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel. 23) Track spill/fill of constants in BPF verifier, from Alexei Starovoitov. 24) Support bounded loops in BPF, from Alexei Starovoitov. 25) Various page_pool API fixes and improvements, from Jesper Dangaard Brouer. 26) Just like ipv4, support ref-countless ipv6 route handling. From Wei Wang. 27) Support VLAN offloading in aquantia driver, from Igor Russkikh. 28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy. 29) Add flower GRE encap/decap support to nfp driver, from Pieter Jansen van Vuuren. 30) Protect against stack overflow when using act_mirred, from John Hurley. 31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen. 32) Use page_pool API in netsec driver, Ilias Apalodimas. 33) Add Google gve network driver, from Catherine Sullivan. 34) More indirect call avoidance, from Paolo Abeni. 35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan. 36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek. 37) Add MPLS manipulation actions to TC, from John Hurley. 38) Add sending a packet to connection tracking from TC actions, and then allow flower classifier matching on conntrack state. From Paul Blakey. 39) Netfilter hw offload support, from Pablo Neira Ayuso" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits) net/mlx5e: Return in default case statement in tx_post_resync_params mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync(). net: dsa: add support for BRIDGE_MROUTER attribute pkt_sched: Include const.h net: netsec: remove static declaration for netsec_set_tx_de() net: netsec: remove superfluous if statement netfilter: nf_tables: add hardware offload support net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload net: flow_offload: add flow_block_cb_is_busy() and use it net: sched: remove tcf block API drivers: net: use flow block API net: sched: use flow block API net: flow_offload: add flow_block_cb_{priv, incref, decref}() net: flow_offload: add list handling functions net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free() net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND net: flow_offload: add flow_block_cb_setup_simple() net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC ...
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-07-083-17/+17
| |\ | | | | | | | | | | | | | | | Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net: bridge: stp: don't cache eth dest pointer before skb pullNikolay Aleksandrov2019-07-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Don't cache eth dest pointer before calling pskb_may_pull. Fixes: cf0f02d04a83 ("[BRIDGE]: use llc for receiving STP packets") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net: bridge: don't cache ether dest pointer on inputNikolay Aleksandrov2019-07-021-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would cache ether dst pointer on input in br_handle_frame_finish but after the neigh suppress code that could lead to a stale pointer since both ipv4 and ipv6 suppress code do pskb_may_pull. This means we have to always reload it after the suppress code so there's no point in having it cached just retrieve it directly. Fixes: 057658cb33fbf ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Fixes: ed842faeb2bd ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 queryNikolay Aleksandrov2019-07-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We get a pointer to the ipv6 hdr in br_ip6_multicast_query but we may call pskb_may_pull afterwards and end up using a stale pointer. So use the header directly, it's just 1 place where it's needed. Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handlingNikolay Aleksandrov2019-07-021-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We take a pointer to grec prior to calling pskb_may_pull and use it afterwards to get nsrcs so record nsrcs before the pull when handling igmp3 and we get a pointer to nsrcs and call pskb_may_pull when handling mld2 which again could lead to reading 2 bytes out-of-bounds. ================================================================== BUG: KASAN: use-after-free in br_multicast_rcv+0x480c/0x4ad0 [bridge] Read of size 2 at addr ffff8880421302b4 by task ksoftirqd/1/16 CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G OE 5.2.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x71/0xab print_address_description+0x6a/0x280 ? br_multicast_rcv+0x480c/0x4ad0 [bridge] __kasan_report+0x152/0x1aa ? br_multicast_rcv+0x480c/0x4ad0 [bridge] ? br_multicast_rcv+0x480c/0x4ad0 [bridge] kasan_report+0xe/0x20 br_multicast_rcv+0x480c/0x4ad0 [bridge] ? br_multicast_disable_port+0x150/0x150 [bridge] ? ktime_get_with_offset+0xb4/0x150 ? __kasan_kmalloc.constprop.6+0xa6/0xf0 ? __netif_receive_skb+0x1b0/0x1b0 ? br_fdb_update+0x10e/0x6e0 [bridge] ? br_handle_frame_finish+0x3c6/0x11d0 [bridge] br_handle_frame_finish+0x3c6/0x11d0 [bridge] ? br_pass_frame_up+0x3a0/0x3a0 [bridge] ? virtnet_probe+0x1c80/0x1c80 [virtio_net] br_handle_frame+0x731/0xd90 [bridge] ? select_idle_sibling+0x25/0x7d0 ? br_handle_frame_finish+0x11d0/0x11d0 [bridge] __netif_receive_skb_core+0xced/0x2d70 ? virtqueue_get_buf_ctx+0x230/0x1130 [virtio_ring] ? do_xdp_generic+0x20/0x20 ? virtqueue_napi_complete+0x39/0x70 [virtio_net] ? virtnet_poll+0x94d/0xc78 [virtio_net] ? receive_buf+0x5120/0x5120 [virtio_net] ? __netif_receive_skb_one_core+0x97/0x1d0 __netif_receive_skb_one_core+0x97/0x1d0 ? __netif_receive_skb_core+0x2d70/0x2d70 ? _raw_write_trylock+0x100/0x100 ? __queue_work+0x41e/0xbe0 process_backlog+0x19c/0x650 ? _raw_read_lock_irq+0x40/0x40 net_rx_action+0x71e/0xbc0 ? __switch_to_asm+0x40/0x70 ? napi_complete_done+0x360/0x360 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __schedule+0x85e/0x14d0 __do_softirq+0x1db/0x5f9 ? takeover_tasklets+0x5f0/0x5f0 run_ksoftirqd+0x26/0x40 smpboot_thread_fn+0x443/0x680 ? sort_range+0x20/0x20 ? schedule+0x94/0x210 ? __kthread_parkme+0x78/0xf0 ? sort_range+0x20/0x20 kthread+0x2ae/0x3a0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x35/0x40 The buggy address belongs to the page: page:ffffea0001084c00 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 flags: 0xffffc000000000() raw: 00ffffc000000000 ffffea0000cfca08 ffffea0001098608 0000000000000000 raw: 0000000000000000 0000000000000003 00000000ffffff7f 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888042130180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888042130200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ffff888042130280: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888042130300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888042130380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint Fixes: bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") Reported-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Martin Weinelt <martin@linuxlounge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nft_meta_bridge: Add NFT_META_BRI_IIFVPROTO supportwenxu2019-07-051-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to match on bridge vlan protocol, eg. nft add rule bridge firewall zones counter meta ibrvproto 0x8100 Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | bridge: add br_vlan_get_proto()wenxu2019-07-051-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | This new function allows you to fetch the bridge port vlan protocol. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_meta_bridge: add NFT_META_BRI_IIFPVID supportwenxu2019-07-051-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to match on the bridge port pvid, eg. nft add rule bridge firewall zones counter meta ibrpvid 10 Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | bridge: add br_vlan_get_pvid_rcu()Pablo Neira Ayuso2019-07-051-4/+15
| | | | | | | | | | | | | | | | | | | | | This new function allows you to fetch bridge pvid from packet path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
| * | netfilter: nft_meta_bridge: Remove the br_private.h headerwenxu2019-07-051-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | nft_bridge_meta should not access the bridge internal API. Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_meta: move bridge meta keys into nft_meta_bridgewenxu2019-07-053-0/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate bridge meta key from nft_meta to meta_bridge to avoid a dependency between the bridge module and nft_meta when using the bridge API available through include/linux/if_bridge.h Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_queue: remove unused hook entries pointerFlorian Westphal2019-07-041-1/+1
| | | | | | | | | | | | | | | | | | | | | Its not used anywhere, so remove this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso2019-06-2527-134/+27
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve conflict between d2912cb15bdd ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500") removing the GPL disclaimer and fe03d4745675 ("Update my email address") which updates Jozsef Kadlecsik's email. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-06-222-8/+2
| | |\| | | | | | | | | | | | | | | | | | | | | Minor SPDX change conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-06-0725-126/+25
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | netfilter: bridge: Fix non-untagged fragment packetwenxu2019-06-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip netns exec ns1 ip a a dev eth0 10.0.0.7/24 ip netns exec ns2 ip link a link eth0 name vlan type vlan id 200 ip netns exec ns2 ip a a dev vlan 10.0.0.8/24 ip l add dev br0 type bridge vlan_filtering 1 brctl addif br0 veth1 brctl addif br0 veth2 bridge vlan add dev veth1 vid 200 pvid untagged bridge vlan add dev veth2 vid 200 A two fragment packet sent from ns2 contains the vlan tag 200. In the bridge conntrack, this packet will defrag to one skb with fraglist. When the packet is forwarded to ns1 through veth1, the first skb vlan tag will be cleared by the "untagged" flags. But the vlan tag in the second skb is still tagged, so the second fragment ends up with tag 200 to ns1. So if the first fragment packet doesn't contain the vlan tag, all of the remain should not contain vlan tag. Fixes: 3c171f496ef5 ("netfilter: bridge: add connection tracking system") Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: bridge: prevent UAF in brnf_exit_net()Christian Brauner2019-06-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent a UAF in brnf_exit_net(). When unregister_net_sysctl_table() is called the ctl_hdr pointer will obviously be freed and so accessing it righter after is invalid. Fix this by stashing a pointer to the table we want to free before we unregister the sysctl header. Note that syzkaller falsely chased this down to the drm tree so the Fixes tag that syzkaller requested would be wrong. This commit uses a different but the correct Fixes tag. /* Splat */ BUG: KASAN: use-after-free in br_netfilter_sysctl_exit_net net/bridge/br_netfilter_hooks.c:1121 [inline] BUG: KASAN: use-after-free in brnf_exit_net+0x38c/0x3a0 net/bridge/br_netfilter_hooks.c:1141 Read of size 8 at addr ffff8880a4078d60 by task kworker/u4:4/8749 CPU: 0 PID: 8749 Comm: kworker/u4:4 Not tainted 5.2.0-rc5-next-20190618 #17 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351 __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 br_netfilter_sysctl_exit_net net/bridge/br_netfilter_hooks.c:1121 [inline] brnf_exit_net+0x38c/0x3a0 net/bridge/br_netfilter_hooks.c:1141 ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:154 cleanup_net+0x3fb/0x960 net/core/net_namespace.c:553 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Allocated by task 11374: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3645 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3654 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:743 [inline] __register_sysctl_table+0xc7/0xef0 fs/proc/proc_sysctl.c:1327 register_net_sysctl+0x29/0x30 net/sysctl_net.c:121 br_netfilter_sysctl_init_net net/bridge/br_netfilter_hooks.c:1105 [inline] brnf_init_net+0x379/0x6a0 net/bridge/br_netfilter_hooks.c:1126 ops_init+0xb3/0x410 net/core/net_namespace.c:130 setup_net+0x2d3/0x740 net/core/net_namespace.c:316 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:103 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:202 ksys_unshare+0x444/0x980 kernel/fork.c:2822 __do_sys_unshare kernel/fork.c:2890 [inline] __se_sys_unshare kernel/fork.c:2888 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2888 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3417 [inline] kfree+0x10a/0x2c0 mm/slab.c:3746 __rcu_reclaim kernel/rcu/rcu.h:215 [inline] rcu_do_batch kernel/rcu/tree.c:2092 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2310 [inline] rcu_core+0xcc7/0x1500 kernel/rcu/tree.c:2291 __do_softirq+0x25c/0x94c kernel/softirq.c:292 The buggy address belongs to the object at ffff8880a4078d40 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 32 bytes inside of 512-byte region [ffff8880a4078d40, ffff8880a4078f40) The buggy address belongs to the page: page:ffffea0002901e00 refcount:1 mapcount:0 mapping:ffff8880aa400a80 index:0xffff8880a40785c0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea0001d636c8 ffffea0001b07308 ffff8880aa400a80 raw: ffff8880a40785c0 ffff8880a40780c0 0000000100000004 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a4078c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4078c80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc > ffff8880a4078d00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8880a4078d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4078e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Reported-by: syzbot+43a3fa52c0d9c5c94f41@syzkaller.appspotmail.com Fixes: 22567590b2e6 ("netfilter: bridge: namespace bridge netfilter sysctls") Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: bridge: namespace bridge netfilter sysctlsChristian Brauner2019-06-171-45/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the /proc/sys/net/bridge folder is only created in the initial network namespace. This patch ensures that the /proc/sys/net/bridge folder is available in each network namespace if the module is loaded and disappears from all network namespaces when the module is unloaded. In doing so the patch makes the sysctls: bridge-nf-call-arptables bridge-nf-call-ip6tables bridge-nf-call-iptables bridge-nf-filter-pppoe-tagged bridge-nf-filter-vlan-tagged bridge-nf-pass-vlan-input-dev apply per network namespace. This unblocks some use-cases where users would like to e.g. not do bridge filtering for bridges in a specific network namespace while doing so for bridges located in another network namespace. The netfilter rules are afaict already per network namespace so it should be safe for users to specify whether bridge devices inside a network namespace are supposed to go through iptables et al. or not. Also, this can already be done per-bridge by setting an option for each individual bridge via Netlink. It should also be possible to do this for all bridges in a network namespace via sysctls. Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: bridge: port sysctls to use brnf_netChristian Brauner2019-06-172-59/+105
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ports the sysctls to use struct brnf_net. With this patch we make it possible to namespace the br_netfilter module in the following patch. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: bridge: convert skb_make_writable to skb_ensure_writableFlorian Westphal2019-05-313-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in the day, skb_ensure_writable did not exist. By now, both functions have the same precondition: I. skb_make_writable will test in this order: 1. wlen > skb->len -> error 2. if not cloned and wlen <= headlen -> OK 3. If cloned and wlen bytes of clone writeable -> OK After those checks, skb is either not cloned but needs to pull from nonlinear area, or writing to head would also alter data of another clone. In both cases skb_make_writable will then call __pskb_pull_tail, which will kmalloc a new memory area to use for skb->head. IOW, after successful skb_make_writable call, the requested length is in linear area and can be modified, even if skb was cloned. II. skb_ensure_writable will do this instead: 1. call pskb_may_pull. This handles case 1 above. After this, wlen is in linear area, but skb might be cloned. 2. return if skb is not cloned 3. return if wlen byte of clone are writeable. 4. fully copy the skb. So post-conditions are the same: *len bytes are writeable in linear area without altering any payload data of a clone, all header pointers might have been changed. Only differences are that skb_ensure_writable is in the core, whereas skb_make_writable lives in netfilter core and the inverted return value. skb_make_writable returns 0 on error, whereas skb_ensure_writable returns negative value. For the normal cases performance is similar: A. skb is not cloned and in linear area: pskb_may_pull is inline helper, so neither function copies. B. skb is cloned, write is in linear area and clone is writeable: both funcions return with step 3. This series removes skb_make_writable from the kernel. While at it, pass the needed value instead, its less confusing that way: There is no special-handling of "0-length" argument in either skb_make_writable or skb_ensure_writable. bridge already makes sure ethernet header is in linear area, only purpose of the make_writable() is is to copy skb->head in case of cloned skbs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nf_conntrack_bridge: add support for IPv6Pablo Neira Ayuso2019-05-301-2/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | br_defrag() and br_fragment() indirections are added in case that IPv6 support comes as a module, to avoid pulling innecessary dependencies in. The new fraglist iterator and fragment transformer APIs are used to implement the refragmentation code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netfilter: bridge: add connection tracking systemPablo Neira Ayuso2019-05-305-0/+397
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds basic connection tracking support for the bridge, including initial IPv4 support. This patch register two hooks to deal with the bridge forwarding path, one from the bridge prerouting hook to call nf_conntrack_in(); and another from the bridge postrouting hook to confirm the entry. The conntrack bridge prerouting hook defragments packets before passing them to nf_conntrack_in() to look up for an existing entry, otherwise a new entry is allocated and it is attached to the skbuff. The conntrack bridge postrouting hook confirms new conntrack entries, ie. if this is the first packet seen, then it adds the entry to the hashtable and (if needed) it refragments the skbuff into the original fragments, leaving the geometry as is if possible. Exceptions are linearized skbuffs, eg. skbuffs that are passed up to nfqueue and conntrack helpers, as well as cloned skbuff for the local delivery (eg. tcpdump), also in case of bridge port flooding (cloned skbuff too). The packet defragmentation is done through the ip_defrag() call. This forces us to save the bridge control buffer, reset the IP control buffer area and then restore it after call. This function also bumps the IP fragmentation statistics, it would be probably desiderable to have independent statistics for the bridge defragmentation/refragmentation. The maximum fragment length is stored in the control buffer and it is used to refragment the skbuff from the postrouting path. The new fraglist splitter and fragment transformer APIs are used to implement the bridge refragmentation code. The br_ip_fragment() function drops the packet in case the maximum fragment size seen is larger than the output port MTU. This patchset follows the principle that conntrack should not drop packets, so users can do it through policy via invalid state matching. Like br_netfilter, there is no refragmentation for packets that are passed up for local delivery, ie. prerouting -> input path. There are calls to nf_reset() already in several spots in the stack since time ago already, eg. af_packet, that show that skbuff fraglist handling from the netif_rx path is supported already. The helpers are called from the postrouting hook, before confirmation, from there we may see packet floods to bridge ports. Then, although unlikely, this may result in exercising the helpers many times for each clone. It would be good to explore how to pass all the packets in a list to the conntrack hook to do this handle only once for this case. Thanks to Florian Westphal for handing me over an initial patchset version to add support for conntrack bridge. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>