summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | net: mdio-ipq4019: change defines to upper caseRobert Marko2020-09-231-3/+3
|/ / | | | | | | | | | | | | | | | | | | | | In the commit adding the IPQ4019 MDIO driver, defines for timeout and sleep partially used lower case. Lets change it to upper case in line with the rest of driver defines. Signed-off-by: Robert Marko <robert.marko@sartura.hr> Cc: Luka Perkov <luka.perkov@sartura.hr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'Introduce-mbox-tracepoints-for-Octeontx2'David S. Miller2020-09-239-2/+146
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subbaraya Sundeep says: ==================== Introduce mbox tracepoints for Octeontx2 This patchset adds tracepoints support for mailbox. In Octeontx2, PFs and VFs need to communicate with AF for allocating and freeing resources. Once all the configuration is done by AF for a PF/VF then packet I/O can happen on PF/VF queues. When an interface is brought up many mailbox messages are sent to AF for initializing queues. Say a VF is brought up then each message is sent to PF and PF forwards to AF and response also traverses from AF to PF and then VF. To aid debugging, tracepoints are added at places where messages are allocated, sent and message interrupts. Below is the trace of one of the messages from VF to AF and AF response back to VF: ~ # echo 1 > /sys/kernel/tracing/events/rvu/enable ~ # ifconfig eth20 up [ 279.379559] eth20 NIC Link is UP 10000 Mbps Full duplex ~ # cat /sys/kernel/tracing/trace ifconfig-171 [000] .... 275.753345: otx2_msg_alloc: [0002:02:00.1] msg:(0x400) size:40 ifconfig-171 [000] ...1 275.753347: otx2_msg_send: [0002:02:00.1] sent 1 msg(s) of size:48 <idle>-0 [001] dNh1 275.753356: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt VF(s) to PF (0x1) kworker/u9:1-90 [001] ...1 275.753364: otx2_msg_send: [0002:02:00.0] sent 1 msg(s) of size:48 kworker/u9:1-90 [001] d.h. 275.753367: otx2_msg_interrupt: [0002:01:00.0] mbox interrupt PF(s) to AF (0x2) kworker/u9:2-167 [002] .... 275.753535: otx2_msg_process: [0002:01:00.0] msg:(0x400) error:0 kworker/u9:2-167 [002] ...1 275.753537: otx2_msg_send: [0002:01:00.0] sent 1 msg(s) of size:32 <idle>-0 [003] d.h1 275.753543: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt AF to PF (0x1) <idle>-0 [001] d.h2 275.754376: otx2_msg_interrupt: [0002:02:00.1] mbox interrupt PF to VF (0x1) v3 changes: Removed EXPORT_TRACEPOINT_SYMBOLS of otx2_msg_send and otx2_msg_check since they are called locally only v2 changes: Removed otx2_msg_err tracepoint since it is similar to devlink_hwerr and it will be used instead when devlink supported is added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-pf: Add tracepoints for PF/VF mailboxSubbaraya Sundeep2020-09-233-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With tracepoints support present in the mailbox code this patch adds tracepoints in PF and VF drivers at places where mailbox messages are allocated, sent and at message interrupts. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-af: Introduce tracepoints for mailboxSubbaraya Sundeep2020-09-236-2/+136
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added tracepoints in mailbox code so that the mailbox operations like message allocation, sending message and message interrupts are traced. Also the mailbox errors occurred like timeout or wrong responses are traced. These will help in debugging mailbox issues. Here's an example output showing one of the mailbox messages sent by PF to AF and AF responding to it: ~# mount -t tracefs none /sys/kernel/tracing/ ~# echo 1 > /sys/kernel/tracing/events/rvu/enable ~# ifconfig eth0 up ~# cat /sys/kernel/tracing/trace ~# cat /sys/kernel/tracing/trace tracer: nop _-----=> irqs-off / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / delay TASK-PID CPU# |||| TIMESTAMP FUNCTION | | | |||| | | ifconfig-2382 [002] .... 756.161892: otx2_msg_alloc: [0002:02:00.0] msg:(0x400) size:40 ifconfig-2382 [002] ...1 756.161895: otx2_msg_send: [0002:02:00.0] sent 1 msg(s) of size:48 <idle>-0 [000] d.h1 756.161902: otx2_msg_interrupt: [0002:01:00.0] mbox interrupt PF(s) to AF (0x2) kworker/u49:0-1165 [000] .... 756.162049: otx2_msg_process: [0002:01:00.0] msg:(0x400) error:0 kworker/u49:0-1165 [000] ...1 756.162051: otx2_msg_send: [0002:01:00.0] sent 1 msg(s) of size:32 kworker/u49:0-1165 [000] d.h. 756.162056: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt AF to PF (0x1) Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: allwinner: remove redundant irqsave and irqrestore in hardIRQBarry Song2020-09-231-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | The comment "holders of db->lock must always block IRQs" and related code to do irqsave and irqrestore don't make sense since we are in a IRQ-disabled hardIRQ context. Cc: Maxime Ripard <mripard@kernel.org> Cc: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: hns3: Constify static structsRikard Falkeborn2020-09-232-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A number of static variables were not modified. Make them const to allow the compiler to put them in read-only memory. In order to do so, constify a couple of input pointers as well as some local pointers. This moves about 35Kb to read-only memory as seen by the output of the size command. Before: text data bss dec hex filename 404938 111534 640 517112 7e3f8 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko After: text data bss dec hex filename 439499 76974 640 517113 7e3f9 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'net-bridge-mcast-IGMPv3-MLDv2-fast-path-part-2'David S. Miller2020-09-237-238/+916
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nikolay Aleksandrov says: ==================== net: bridge: mcast: IGMPv3/MLDv2 fast-path (part 2) This is the second part of the IGMPv3/MLDv2 support which adds support for the fast-path. In order to be able to handle source entries we add mdb support for S,G entries (i.e. we add source address support to br_ip), that requires to extend the current mdb netlink API, fortunately we just add another attribute which will contain nested future mdb attributes, then we use it to add support for S,G user- add, del and dump. The lookup sequence is simple: when IGMPv3/MLDv2 are enabled do the S,G lookup first and if it fails fallback to *,G. The more complex part is when we begin handling source lists and auto-installing S,G entries and *,G filter mode transitions. We have the following cases: 1) *,G INCLUDE -> EXCLUDE transition: we need to install the port in all of *,G's installed S,G entries for proper replication (except the ones explicitly blocked), this is also necessary when adding a new *,G EXCLUDE port group 2) *,G EXCLUDE -> INCLUDE transition: we need to remove the port from all of *,G's installed S,G entries, this is also necessary when removing a *,G port group 3) New S,G port entry: we need to install all current *,G EXCLUDE ports 4) Remove S,G port entry: if all other port groups were auto-installed we can safely remove them and delete the whole S,G entry Currently we compute these operations from the available ports, their source lists and their filter mode. In the future we can extend the port group structure and reduce the running time of these ops. Also one current limitation is that host-joined S,G entries are not supported. I.e. one cannot add "dev bridge port bridge" mdb S,G entries. The host join is currently considered an EXCLUDE {} join, so it's reflected in all of *,G's installed S,G entries. If an S,G,port entry is added as temporary then the kernel can take it over if a source shows up from a report, permanent entries are skipped. In order to properly handle blocked sources we add a new port group blocked flag to avoid forwarding to that port group in the S,G. Finally when forwarding we use the port group filter mode (if it's INCLUDE and the port group is from a *,G then don't replicate to it, respectively if it's EXCLUDE then forward) and the blocked flag (obviously if it's set - skip that port unless it's a router port) to decide if the port should be skipped. Another limitation is that we can't do some of the above transitions without small traffic drop while installing/removing entries. That will be taken care of when we add atomic swap of port replication lists later. Patch break down: patches 1-3: prepare the mdb code for better extack support which is used in future patches to return a more meaningful error patches 4-6: add the source address field to struct br_ip, and do minor cleanups around it patches 7-8: extend the mdb netlink API so we can send new mdb attributes and uses the new API for S,G entry add/del/dump support patch 9: takes care of S,G entries when doing a lookup (first S,G then *,G lookup) patch 10: adds a new port group field and attribute for origin protocol we use the already available RTPROT_ definitions, currently user-space entries are added as RTPROT_STATIC and kernel entries are added as RTPROT_KERNEL, we may allow user-space to set custom values later (e.g. for FRR, clag) patch 11: adds an internal S,G,port rhashtable to speed up filter mode transitions patch 12: initial automatic install of S,G entries based on port groups' source lists patch 13: handles port group modes on transitions or when new port group entries are added patch 14: self-explanatory - adds support for blocked port group entries needed to stop forwarding to particular S,G,port entries patch 15: handles host-join/leave state changes, treats host-joins as EXCLUDE {} groups (reflected in all *,G's S,G entries) patch 16: finally adds the fast-path filter mode and block flag support Here're the sets that will come next (in order): - iproute2 support for IGMPv3/MLDv2 - selftests for all mode transitions and group flags - explicit host tracking for proper fast-leave support - atomic port replication lists (these are also needed for broadcast forwarding optimizations) - mode transition optimization and removal of open-coded sorted lists Not implemented yet: - Host IGMPv3/MLDv2 filter support (currently we handle only join/leave as before) - Proper other querier source timer and value updates - IGMPv3/v2 MLDv2/v1 compat (I have a few rough patches for this one) v2: fix build with CONFIG_BATMAN_ADV_MCAST in patch 6 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: when forwarding handle filter mode and blocked flagNikolay Aleksandrov2020-09-231-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the mdst entry is a *,G or when the port has the blocked flag. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: handle host stateNikolay Aleksandrov2020-09-231-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since host joins are considered as EXCLUDE {} joins we need to reflect that in all of *,G ports' S,G entries. Since the S,Gs can have host_joined == true only set automatically we can safely set it to false when removing all automatically added entries upon S,G delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: add support for blocked port groupsNikolay Aleksandrov2020-09-234-6/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When excluding S,G entries we need a way to block a particular S,G,port. The new port group flag is managed based on the source's timer as per RFCs 3376 and 3810. When a source expires and its port group is in EXCLUDE mode, it will be blocked. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: handle port group filter modesNikolay Aleksandrov2020-09-234-2/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to handle group filter mode transitions and initial state. To change a port group's INCLUDE -> EXCLUDE mode (or when we have added a new port group in EXCLUDE mode) we need to add that port to all of *,G ports' S,G entries for proper replication. When the EXCLUDE state is changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be called after the source list processing because the assumption is that all of the group's S,G entries will be created before transitioning to EXCLUDE mode, i.e. most importantly its blocked entries will already be added so it will not get automatically added to them. The transition EXCLUDE -> INCLUDE happens only when a port group timer expires, it requires us to remove that port from all of *,G ports' S,G entries where it was automatically added previously. Finally when we are adding a new S,G entry we must add all of *,G's EXCLUDE ports to it. In order to distinguish automatically added *,G EXCLUDE ports we have a new port group flag - MDB_PG_FLAGS_STAR_EXCL. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: install S,G entries automatically based on reportsNikolay Aleksandrov2020-09-232-39/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for automatic install of S,G mdb entries based on the port group's source list and the source entry's timer. Once installed the S,G will be used when forwarding packets if the approprate multicast/mld versions are set. A new source flag called BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry installed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: add sg_port rhashtableNikolay Aleksandrov2020-09-234-65/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To speedup S,G forward handling we need to be able to quickly find out if a port is a member of an S,G group. To do that add a global S,G port rhashtable with key: source addr, group addr, protocol, vid (all br_ip fields) and port pointer. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: add rt_protocol field to the port group structNikolay Aleksandrov2020-09-234-19/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to be able to differentiate between pg entries created by user-space and the kernel when we start generating S,G entries for IGMPv3/MLDv2's fast path. User-space entries are created by default as RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can allow user-space to provide the entry rt_protocol so we can differentiate between who added the entries specifically (e.g. clag, admin, frr etc). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: when igmpv3/mldv2 are enabled lookup (S,G) first, then (*,G)Nikolay Aleksandrov2020-09-231-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | If (S,G) entries are enabled (igmpv3/mldv2) then look them up first. If there isn't a present (S,G) entry then try to find (*,G). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mdb: add support for add/del/dump of entries with sourceNikolay Aleksandrov2020-09-233-28/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new mdb attributes (MDBE_ATTR_SOURCE for setting, MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb entries with a source address (S,G). New S,G entries are created with filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and IPv6, they're validated and parsed based on their protocol. S,G host joined entries which are added by user are not allowed yet. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mdb: add support to extend add/del commandsNikolay Aleksandrov2020-09-232-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the MDB add/del code expects an exact struct br_mdb_entry we can't really add any extensions, thus add a new nested attribute at the level of MDBA_SET_ENTRY called MDBA_SET_ENTRY_ATTRS which will be used to pass all new options via netlink attributes. This patch doesn't change anything functionally since the new attribute is not used yet, only parsed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: rename br_ip's u member to dstNikolay Aleksandrov2020-09-234-29/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since now we have src in br_ip, u no longer makes sense so rename it to dst. No functional changes. v2: fix build with CONFIG_BATMAN_ADV_MCAST CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sven Eckelmann <sven@narfation.org> CC: b.a.t.m.a.n@lists.open-mesh.org Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mcast: use br_ip's src for src groups and querier addressNikolay Aleksandrov2020-09-232-30/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have src and dst in br_ip it is logical to use the src field for the cases where we need to work with a source address such as querier source address and group source address. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: add src field to br_ipNikolay Aleksandrov2020-09-231-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | Add a new src field to struct br_ip which will be used to lookup S, G entries. When SSM option is added we will enable full br_ip lookups. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mdb: use extack in br_mdb_add() and br_mdb_add_group()Nikolay Aleksandrov2020-09-231-12/+42
| | | | | | | | | | | | | | | | | | | | | Pass and use extack all the way down to br_mdb_add_group(). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mdb: move all port and bridge checks to br_mdb_addNikolay Aleksandrov2020-09-231-17/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid doing duplicate device checks and searches (the same were done in br_mdb_add and __br_mdb_add) pass the already found port to __br_mdb_add and pull the bridge's netif_running and enabled multicast checks to br_mdb_add. This would also simplify the future extack errors. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: bridge: mdb: use extack in br_mdb_parse()Nikolay Aleksandrov2020-09-231-21/+39
|/ / | | | | | | | | | | | | | | We can drop the pr_info() calls and just use extack to return a meaningful error to user-space when br_mdb_parse() fails. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: realtek: Remove set but not used variableZheng Yongjun2020-09-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/realtek/8139cp.c: In function cp_tx_timeout: drivers/net/ethernet/realtek/8139cp.c:1242:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] `rc` is never used, so remove it. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | hinic: improve the comments of function headerLuo bin2020-09-236-4/+11
| | | | | | | | | | | | | | | | Fix the warnings about function header comments when building hinic driver with "W=1" option. Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2020-09-23124-2040/+4211
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-09-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 95 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 4211 insertions(+), 2040 deletions(-). The main changes are: 1) Full multi function support in libbpf, from Andrii. 2) Refactoring of function argument checks, from Lorenz. 3) Make bpf_tail_call compatible with functions (subprograms), from Maciej. 4) Program metadata support, from YiFei. 5) bpf iterator optimizations, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tools resolve_btfids: Always force HOSTARCHJiri Olsa2020-09-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seth reported problem with cross builds, that fail on resolve_btfids build, because we are trying to build it on cross build arch. Fixing this by always forcing the host arch. Reported-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200923185735.3048198-2-jolsa@kernel.org
| * | bpf: Check CONFIG_BPF option for resolve_btfidsJiri Olsa2020-09-232-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently all the resolve_btfids 'users' are under CONFIG_BPF code, so if we have CONFIG_BPF disabled, resolve_btfids will fail, because there's no data to resolve. Disabling resolve_btfids if there's CONFIG_BPF disabled, so we won't fail such builds. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200923185735.3048198-1-jolsa@kernel.org
| * | bpf: Explicitly size compatible_reg_typesLorenz Bauer2020-09-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arrays with designated initializers have an implicit length of the highest initialized value plus one. I used this to ensure that newly added entries in enum bpf_reg_type get a NULL entry in compatible_reg_types. This is difficult to understand since it requires knowledge of the peculiarities of designated initializers. Use __BPF_ARG_TYPE_MAX to size the array instead. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200923160156.80814-1-lmb@cloudflare.com
| * | selftests/bpf: Fix stat probe in d_path testJiri Olsa2020-09-213-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some kernels builds might inline vfs_getattr call within fstat syscall code path, so fentry/vfs_getattr trampoline is not called. Add security_inode_getattr to allowlist and switch the d_path test stat trampoline to security_inode_getattr. Keeping dentry_open and filp_close, because they are in their own files, so unlikely to be inlined, but in case they are, adding security_file_open. Adding flags that indicate trampolines were called and failing the test if any of them got missed, so it's easier to identify the issue next time. Fixes: e4d1af4b16f8 ("selftests/bpf: Add test for d_path helper") Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200918112338.2618444-1-jolsa@kernel.org
| * | bpf: Using rcu_read_lock for bpf_sk_storage_map iteratorYonghong Song2020-09-211-18/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a bucket contains a lot of sockets, during bpf_iter traversing a bucket, concurrent userspace bpf_map_update_elem() and bpf program bpf_sk_storage_{get,delete}() may experience some undesirable delays as they will compete with bpf_iter for bucket lock. Note that the number of buckets for bpf_sk_storage_map is roughly the same as the number of cpus. So if there are lots of sockets in the system, each bucket could contain lots of sockets. Different actual use cases may experience different delays. Here, using selftest bpf_iter subtest bpf_sk_storage_map, I hacked the kernel with ktime_get_mono_fast_ns() to collect the time when a bucket was locked during bpf_iter prog traversing that bucket. This way, the maximum incurred delay was measured w.r.t. the number of elements in a bucket. # elems in each bucket delay(ns) 64 17000 256 72512 2048 875246 The potential delays will be further increased if we have even more elemnts in a bucket. Using rcu_read_lock() is a reasonable compromise here. It may lose some precision, e.g., access stale sockets, but it will not hurt performance of bpf program or user space application which also tries to get/delete or update map elements. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200916224645.720172-1-yhs@fb.com
| * | Merge branch 'refactor-check_func_arg'Alexei Starovoitov2020-09-2111-237/+239
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lorenz Bauer says: ==================== Changes in v4: - Output the desired type on BTF ID mismatch (Martin) Changes in v3: - Fix BTF_ID_LIST_SINGLE if BTF is disabled (Martin) - Drop incorrect arg_btf_id in bpf_sk_storage.c (Martin) - Check for arg_btf_id in check_func_proto (Martin) - Drop incorrect PTR_TO_BTF_ID from fullsock_types (Martin) - Introduce btf_seq_file_ids in bpf_trace.c to reduce duplication Changes in v2: - Make the series stand alone (Martin) - Drop incorrect BTF_SET_START fix (Andrii) - Only support a single BTF ID per argument (Martin) - Introduce BTF_ID_LIST_SINGLE macro (Andrii) - Skip check_ctx_reg iff register is NULL - Change output of check_reg_type slightly, to avoid touching tests Original cover letter: Currently, check_func_arg has this pretty gnarly if statement that compares the valid arg_type with the actualy reg_type. Sprinkled in-between are checks for register_is_null, to short circuit these tests if we're dealing with a nullable arg_type. There is also some code for later bounds / access checking hidden away in there. This series of patches refactors the function into something like this: if (reg_is_null && arg_type_is_nullable) skip type checking do type checking, including BTF validation do bounds / access checking The type checking is now table driven, which makes it easy to extend the acceptable types. Maybe more importantly, using a table makes it easy to provide more helpful verifier output (see the last patch). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | bpf: Use a table to drive helper arg type checksLorenz Bauer2020-09-212-74/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mapping between bpf_arg_type and bpf_reg_type is encoded in a big hairy if statement that is hard to follow. The debug output also leaves to be desired: if a reg_type doesn't match we only print one of the options, instead printing all the valid ones. Convert the if statement into a table which is then used to drive type checking. If none of the reg_types match we print all options, e.g.: R2 type=rdonly_buf expected=fp, pkt, pkt_meta, map_value Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-12-lmb@cloudflare.com
| | * | bpf: Hoist type checking for nullable arg typesLorenz Bauer2020-09-211-34/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | check_func_arg has a plethora of weird if statements with empty branches. They work around the fact that *_OR_NULL argument types should accept a SCALAR_VALUE register, as long as it's value is 0. These statements make it difficult to reason about the type checking logic. Instead, skip more detailed type checking logic iff the register is 0, and the function expects a nullable type. This allows simplifying the type checking itself. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-11-lmb@cloudflare.com
| | * | bpf: Check ARG_PTR_TO_SPINLOCK register type in check_func_argLorenz Bauer2020-09-211-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the check for PTR_TO_MAP_VALUE to check_func_arg, where all other checking is done as well. Move the invocation of process_spin_lock away from the register type checking, to allow a future refactoring. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-10-lmb@cloudflare.com
| | * | bpf: Set meta->raw_mode for pointers close to useLorenz Bauer2020-09-211-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we encounter a pointer to memory, we set meta->raw_mode depending on the type of memory we point at. What isn't obvious is that this information is only used when the next memory size argument is encountered. Move the assignment closer to where it's used, and add a comment that explains what is going on. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-9-lmb@cloudflare.com
| | * | bpf: Make context access check genericLorenz Bauer2020-09-211-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always check context access if the register we're operating on is PTR_TO_CTX, rather than relying on ARG_PTR_TO_CTX. This allows simplifying the arg_type checking section of the function. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-8-lmb@cloudflare.com
| | * | bpf: Make reference tracking genericLorenz Bauer2020-09-211-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of dealing with reg->ref_obj_id individually for every arg type that needs it, rely on the fact that ref_obj_id is zero if the register is not reference tracked. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-7-lmb@cloudflare.com
| | * | bpf: Make BTF pointer type checking genericLorenz Bauer2020-09-211-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perform BTF type checks if the register we're working on contains a BTF pointer, rather than if the argument is for a BTF pointer. This is easier to understand, and allows removing the code from the arg_type checking section of the function. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-6-lmb@cloudflare.com
| | * | bpf: Allow specifying a BTF ID per argument in function protosLorenz Bauer2020-09-219-103/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function prototypes using ARG_PTR_TO_BTF_ID currently use two ways to signal which BTF IDs are acceptable. First, bpf_func_proto.btf_id is an array of IDs, one for each argument. This array is only accessed up to the highest numbered argument that uses ARG_PTR_TO_BTF_ID and may therefore be less than five arguments long. It usually points at a BTF_ID_LIST. Second, check_btf_id is a function pointer that is called by the verifier if present. It gets the actual BTF ID of the register, and the argument number we're currently checking. It turns out that the only user check_arg_btf_id ignores the argument, and is simply used to check whether the BTF ID has a struct sock_common at it's start. Replace both of these mechanisms with an explicit BTF ID for each argument in a function proto. Thanks to btf_struct_ids_match this is very flexible: check_arg_btf_id can be replaced by requiring struct sock_common. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-5-lmb@cloudflare.com
| | * | btf: Add BTF_ID_LIST_SINGLE macroLorenz Bauer2020-09-212-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a convenience macro that allows defining a BTF ID list with a single item. This lets us cut down on repetitive macros. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-4-lmb@cloudflare.com
| | * | bpf: Check scalar or invalid register in check_helper_mem_accessLorenz Bauer2020-09-211-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the check for a NULL or zero register to check_helper_mem_access. This makes check_stack_boundary easier to understand. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-3-lmb@cloudflare.com
| | * | btf: Make btf_set_contains take a const pointerLorenz Bauer2020-09-212-2/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | bsearch doesn't modify the contents of the array, so we can take a const pointer. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-2-lmb@cloudflare.com
| * | bpf: Fix potential call bpf_link_free() in atomic contextMuchun Song2020-09-211-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The in_atomic() macro cannot always detect atomic context, in particular, it cannot know about held spinlocks in non-preemptible kernels. Although, there is no user call bpf_link_put() with holding spinlock now, be on the safe side, so we can avoid this in the future. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200917074453.20621-1-songmuchun@bytedance.com
| * | bpf: Use hlist_add_head_rcu when linking to local_storageMartin KaFai Lau2020-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The local_storage->list will be traversed by rcu reader in parallel. Thus, hlist_add_head_rcu() is needed in bpf_selem_link_storage_nolock(). This patch fixes it. This part of the code has recently been refactored in bpf-next and this patch makes changes to the new file "bpf_local_storage.c". Instead of using the original offending commit in the Fixes tag, the commit that created the file "bpf_local_storage.c" is used. A separate fix has been provided to the bpf tree. Fixes: 450af8d0f6be ("bpf: Split bpf_local_storage to bpf_sk_storage") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200916204453.2003915-1-kafai@fb.com
| * | samples/bpf: Fix test_map_in_map on s390Ilya Leoshkevich2020-09-191-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s390 uses socketcall multiplexer instead of individual socket syscalls. Therefore, "kprobe/" SYSCALL(sys_connect) does not trigger and test_map_in_map fails. Fix by using "kprobe/__sys_connect" instead. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200915115519.3769807-1-iii@linux.ibm.com
| * | selftests/bpf: Fix endianness issue in test_sockopt_skIlya Leoshkevich2020-09-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getsetsockopt() calls getsockopt() with optlen == 1, but then checks the resulting int. It is ok on little endian, but not on big endian. Fix by checking char instead. Fixes: 8a027dc0d8f5 ("selftests/bpf: add sockopt test that exercises sk helpers") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200915113928.3768496-1-iii@linux.ibm.com
| * | selftests/bpf: Fix endianness issue in sk_assignIlya Leoshkevich2020-09-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | server_map's value size is 8, but the test tries to put an int there. This sort of works on x86 (unless followed by non-0), but hard fails on s390. Fix by using __s64 instead of int. Fixes: 2d7824ffd25c ("selftests: bpf: Add test for sk_assign") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200915113815.3768217-1-iii@linux.ibm.com
| * | selftests/bpf: Add tailcall_bpf2bpf testsMaciej Fijalkowski2020-09-175-0/+533
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add four tests to tailcalls selftest explicitly named "tailcall_bpf2bpf_X" as their purpose is to validate that combination of tailcalls with bpf2bpf calls are working properly. These tests also validate LD_ABS from subprograms. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | bpf: Add abnormal return checks.Alexei Starovoitov2020-09-173-22/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LD_[ABS|IND] instructions may return from the function early. bpf_tail_call pseudo instruction is either fallthrough or return. Allow them in the subprograms only when subprograms are BTF annotated and have scalar return types. Allow ld_abs and tail_call in the main program even if it calls into subprograms. In the past that was not ok to do for ld_abs, since it was JITed with special exit sequence. Since bpf_gen_ld_abs() was introduced the ld_abs looks like normal exit insn from JIT point of view, so it's safe to allow them in the main program. Signed-off-by: Alexei Starovoitov <ast@kernel.org>