summaryrefslogtreecommitdiffstats
path: root/net/core/xdp.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2020-12-111-20/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * xdp: Remove the xdp_attachment_flags_ok() callbackToke Høiland-Jørgensen2020-12-091-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device"), the XDP program attachment info is now maintained in the core code. This interacts badly with the xdp_attachment_flags_ok() check that prevents unloading an XDP program with different load flags than it was loaded with. In practice, two kinds of failures are seen: - An XDP program loaded without specifying a mode (and which then ends up in driver mode) cannot be unloaded if the program mode is specified on unload. - The dev_xdp_uninstall() hook always calls the driver callback with the mode set to the type of the program but an empty flags argument, which means the flags_ok() check prevents the program from being removed, leading to bpf prog reference leaks. The original reason this check was added was to avoid ambiguity when multiple programs were loaded. With the way the checks are done in the core now, this is quite simple to enforce in the core code, so let's add a check there and get rid of the xdp_attachment_flags_ok() callback entirely. Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
| * xdp: Handle MEM_TYPE_XSK_BUFF_POOL correctly in xdp_return_buff()Björn Töpel2020-11-301-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that it does exist a path where xdp_return_buff() is being passed an XDP buffer of type MEM_TYPE_XSK_BUFF_POOL. This path is when AF_XDP zero-copy mode is enabled, and a buffer is redirected to a DEVMAP with an attached XDP program that drops the buffer. This change simply puts the handling of MEM_TYPE_XSK_BUFF_POOL back into xdp_return_buff(). Fixes: 82c41671ca4f ("xdp: Simplify xdp_return_{frame, frame_rx_napi, buff}") Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/bpf/20201127171726.123627-1-bjorn.topel@gmail.com
* | xsk: Propagate napi_id to XDP socket Rx pathBjörn Töpel2020-12-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Add napi_id to the xdp_rxq_info structure, and make sure the XDP socket pick up the napi_id in the Rx path. The napi_id is used to find the corresponding NAPI structure for socket busy polling. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
* | net: page_pool: Add bulk support for ptr_ringLorenzo Bianconi2020-11-141-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce the capability to batch page_pool ptr_ring refill since it is usually run inside the driver NAPI tx completion loop. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
* | net: xdp: Introduce bulking for xdp tx return pathLorenzo Bianconi2020-11-141-0/+59
|/ | | | | | | | | | | | | | | | | | | | | | | XDP bulk APIs introduce a defer/flush mechanism to return pages belonging to the same xdp_mem_allocator object (identified via the mem.id field) in bulk to optimize I-cache and D-cache since xdp_return_frame is usually run inside the driver NAPI tx completion loop. The bulk queue size is set to 16 to be aligned to how XDP_REDIRECT bulking works. The bulk is flushed when it is full or when mem.id changes. xdp_frame_bulk is usually stored/allocated on the function call-stack to avoid locking penalties. Current implementation considers only page_pool memory model. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
* bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commandsAndrii Nakryiko2020-07-251-9/+0
| | | | | | | | | | | | | | | Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
* xdp: Handle frame_sz in xdp_convert_zc_to_xdp_frame()Hangbin Liu2020-06-171-0/+1
| | | | | | | | | | | | In commit 34cc0b338a61 we only handled the frame_sz in convert_to_xdp_frame(). This patch will also handle frame_sz in xdp_convert_zc_to_xdp_frame(). Fixes: 34cc0b338a61 ("xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616103518.2963410-1-liuhangbin@gmail.com
* xdp: Simplify xdp_return_{frame, frame_rx_napi, buff}Björn Töpel2020-05-211-12/+9
| | | | | | | | | | | | | | | | The xdp_return_{frame,frame_rx_napi,buff} function are never used, except in xdp_convert_zc_to_xdp_frame(), by the MEM_TYPE_XSK_BUFF_POOL memory type. To simplify and reduce code, change so that xdp_convert_zc_to_xdp_frame() calls xsk_buff_free() directly since the type is know, and remove MEM_TYPE_XSK_BUFF_POOL from the switch statement in __xdp_return() function. Suggested-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-14-bjorn.topel@gmail.com
* xsk: Remove MEM_TYPE_ZERO_COPY and corresponding codeBjörn Töpel2020-05-211-37/+5
| | | | | | | | | | | There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding code, including the "handle" member of struct xdp_buff. rfc->v1: Fixed spelling in commit message. (Björn) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-13-bjorn.topel@gmail.com
* xsk: Introduce AF_XDP buffer allocation APIBjörn Töpel2020-05-211-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to simplify AF_XDP zero-copy enablement for NIC driver developers, a new AF_XDP buffer allocation API is added. The implementation is based on a single core (single producer/consumer) buffer pool for the AF_XDP UMEM. A buffer is allocated using the xsk_buff_alloc() function, and returned using xsk_buff_free(). If a buffer is disassociated with the pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is said to be released. Currently, the release function is only used by the AF_XDP internals and not visible to the driver. Drivers using this API should register the XDP memory model with the new MEM_TYPE_XSK_BUFF_POOL type. The API is defined in net/xdp_sock_drv.h. The buffer type is struct xdp_buff, and follows the lifetime of regular xdp_buffs, i.e. the lifetime of an xdp_buff is restricted to a NAPI context. In other words, the API is not replacing xdp_frames. In addition to introducing the API and implementations, the AF_XDP core is migrated to use the new APIs. rfc->v1: Fixed build errors/warnings for m68k and riscv. (kbuild test robot) Added headroom/chunk size getter. (Maxim/Björn) v1->v2: Swapped SoBs. (Maxim) v2->v3: Initialize struct xdp_buff member frame_sz. (Björn) Add API to query the DMA address of a frame. (Maxim) Do DMA sync for CPU till the end of the frame to handle possible growth (frame_sz). (Maxim) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-6-bjorn.topel@gmail.com
* xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frameJesper Dangaard Brouer2020-05-141-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use hole in struct xdp_frame, when adding member frame_sz, which keeps same sizeof struct (32 bytes) Drivers ixgbe and sfc had bug cases where the necessary/expected tailroom was not reserved. This can lead to some hard to catch memory corruption issues. Having the drivers frame_sz this can be detected when packet length/end via xdp->data_end exceed the xdp_data_hard_end pointer, which accounts for the reserved the tailroom. When detecting this driver issue, simply fail the conversion with NULL, which results in feedback to driver (failing xdp_do_redirect()) causing driver to drop packet. Given the lack of consistent XDP stats, this can be hard to troubleshoot. And given this is a driver bug, we want to generate some more noise in form of a WARN stack dump (to ID the driver code that inlined convert_to_xdp_frame). Inlining the WARN macro is problematic, because it adds an asm instruction (on Intel CPUs ud2) what influence instruction cache prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this and at the same time make identifying the function and line of this inlined function easier. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945337313.97035.10015729316710496600.stgit@firesoul
* net: page_pool: API cleanup and commentsIlias Apalodimas2020-02-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Functions starting with __ usually indicate those which are exported, but should not be called directly. Update some of those declared in the API and make it more readable. page_pool_unmap_page() and page_pool_release_page() were doing exactly the same thing calling __page_pool_clean_page(). Let's rename __page_pool_clean_page() to page_pool_release_page() and export it in order to show up on perf logs and get rid of page_pool_unmap_page(). Finally rename __page_pool_put_page() to page_pool_put_page() since we can now directly call it from drivers and rename the existing page_pool_put_page() to page_pool_put_full_page() since they do the same thing but the latter is trying to sync the full DMA area. This patch also updates netsec, mvneta and stmmac drivers which use those functions. Suggested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* treewide: Use sizeof_field() macroPankaj Bharadiya2019-12-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
* xdp: obtain the mem_id mutex before trying to remove an entry.Jonathan Lemon2019-12-041-4/+4
| | | | | | | | | | | | | | | | A lockdep splat was observed when trying to remove an xdp memory model from the table since the mutex was obtained when trying to remove the entry, but not before the table walk started: Fix the splat by obtaining the lock before starting the table walk. Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.") Reported-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: remove memory poison on free for struct xdp_mem_allocatorJesper Dangaard Brouer2019-11-181-5/+0
| | | | | | | | | | | | | | | | | | When looking at the details I realised that the memory poison in __xdp_mem_allocator_rcu_free doesn't make sense. This is because the SLUB allocator uses the first 16 bytes (on 64 bit), for its freelist, which overlap with members in struct xdp_mem_allocator, that were updated. Thus, SLUB already does the "poisoning" for us. I still believe that poisoning memory make sense in other cases. Kernel have gained different use-after-free detection mechanism, but enabling those is associated with a huge overhead. Experience is that debugging facilities can change the timing so much, that that a race condition will not be provoked when enabled. Thus, I'm still in favour of poisoning memory where it makes sense. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* page_pool: do not release pool until inflight == 0.Jonathan Lemon2019-11-161-77/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | The page pool keeps track of the number of pages in flight, and it isn't safe to remove the pool until all pages are returned. Disallow removing the pool until all pages are back, so the pool is always available for page producers. Make the page pool responsible for its own delayed destruction instead of relying on XDP, so the page pool can be used without the xdp memory model. When all pages are returned, free the pool and notify xdp if the pool is registered with the xdp memory system. Have the callback perform a table walk since some drivers (cpsw) may share the pool among multiple xdp_rxq_info. Note that the increment of pages_state_release_cnt may result in inflight == 0, resulting in the pool being released. Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: Trivial, fix spelling in function descriptionAnton Ivanov2019-10-111-1/+1
| | | | | | | | Fix typo 'boolian' into 'boolean'. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191011084303.28418-1-anton.ivanov@cambridgegreys.com
* net: core: page_pool: add user refcnt and reintroduce page_pool_destroyIvan Khoronzhuk2019-07-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jesper recently removed page_pool_destroy() (from driver invocation) and moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to handle in-flight packets/pages. This created an asymmetry in drivers create/destroy pairs. This patch reintroduce page_pool_destroy and add page_pool user refcnt. This serves the purpose to simplify drivers error handling as driver now drivers always calls page_pool_destroy() and don't need to track if xdp_rxq_info_reg_mem_model() was unsuccessful. This could be used for a special cases where a single RX-queue (with a single page_pool) provides packets for two net_device'es, and thus needs to register the same page_pool twice with two xdp_rxq_info structures. This patch is primarily to ease API usage for drivers. The recently merged netsec driver, actually have a bug in this area, which is solved by this API change. This patch is a modified version of Ivan Khoronzhuk's original patch. Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/ Fixes: 5c67bf0ec4d0 ("net: netsec: Use page_pool API") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: Make __mem_id_disconnect staticYueHaibing2019-06-261-1/+1
| | | | | | | | | | | | | Fix sparse warning: net/core/xdp.c:88:6: warning: symbol '__mem_id_disconnect' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* xdp: add tracepoints for XDP memJesper Dangaard Brouer2019-06-191-16/+5
| | | | | | | | | | | | | | | These tracepoints make it easier to troubleshoot XDP mem id disconnect. The xdp:mem_disconnect tracepoint cannot be replaced via kprobe. It is placed at the last stable place for the pointer to struct xdp_mem_allocator, just before it's scheduled for RCU removal. It also extract info on 'safe_to_remove' and 'force'. Detailed info about in-flight pages is not available at this layer. The next patch will added tracepoints needed at the page_pool layer for this. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: force mem allocator removal and periodic warningJesper Dangaard Brouer2019-06-191-6/+31
| | | | | | | | | | | | | | | | If bugs exists or are introduced later e.g. by drivers misusing the API, then we want to warn about the issue, such that developer notice. This patch will generate a bit of noise in form of periodic pr_warn every 30 seconds. It is not nice to have this stall warning running forever. Thus, this patch will (after 120 attempts) force disconnect the mem id (from the rhashtable) and free the page_pool object. This will cause fallback to the put_page() as before, which only potentially leak DMA-mappings, if objects are really stuck for this long. In that unlikely case, a WARN_ONCE should show us the call stack. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: tracking page_pool resources and safe removalJesper Dangaard Brouer2019-06-191-6/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is needed before we can allow drivers to use page_pool for DMA-mappings. Today with page_pool and XDP return API, it is possible to remove the page_pool object (from rhashtable), while there are still in-flight packet-pages. This is safely handled via RCU and failed lookups in __xdp_return() fallback to call put_page(), when page_pool object is gone. In-case page is still DMA mapped, this will result in page note getting correctly DMA unmapped. To solve this, the page_pool is extended with tracking in-flight pages. And XDP disconnect system queries page_pool and waits, via workqueue, for all in-flight pages to be returned. To avoid killing performance when tracking in-flight pages, the implement use two (unsigned) counters, that in placed on different cache-lines, and can be used to deduct in-flight packets. This is done by mapping the unsigned "sequence" counters onto signed Two's complement arithmetic operations. This is e.g. used by kernel's time_after macros, described in kernel commit 1ba3aab3033b and 5a581b367b5, and also explained in RFC1982. The trick is these two incrementing counters only need to be read and compared, when checking if it's safe to free the page_pool structure. Which will only happen when driver have disconnected RX/alloc side. Thus, on a non-fast-path. It is chosen that page_pool tracking is also enabled for the non-DMA use-case, as this can be used for statistics later. After this patch, using page_pool requires more strict resource "release", e.g. via page_pool_release_page() that was introduced in this patchset, and previous patches implement/fix this more strict requirement. Drivers no-longer call page_pool_destroy(). Drivers already call xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which will attempt to disconnect the mem id, and if attempt fails schedule the disconnect for later via delayed workqueue. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: page_pool related fix to cpumapJesper Dangaard Brouer2019-06-191-0/+15
| | | | | | | | | | | | | | | | | When converting an xdp_frame into an SKB, and sending this into the network stack, then the underlying XDP memory model need to release associated resources, because the network stack don't have callbacks for XDP memory models. The only memory model that needs this is page_pool, when a driver use the DMA-mapping feature. Introduce page_pool_release_page(), which basically does the same as page_pool_unmap_page(). Add xdp_release_frame() as the XDP memory model interface for calling it, if the memory model match MEM_TYPE_PAGE_POOL, to save the function call overhead for others. Have cpumap call xdp_release_frame() before xdp_scrub_frame(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: fix leak of IDA cyclic id if rhashtable_insert_slow failsJesper Dangaard Brouer2019-06-191-0/+2
| | | | | | | | | | | | | | Fix error handling case, where inserting ID with rhashtable_insert_slow fails in xdp_rxq_info_reg_mem_model, which leads to never releasing the IDA ID, as the lookup in xdp_rxq_info_unreg_mem_model fails and thus ida_simple_remove() is never called. Fix by releasing ID via ida_simple_remove(), and mark xdp_rxq->mem.id with zero, which is already checked in xdp_rxq_info_unreg_mem_model(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 363Thomas Gleixner2019-06-051-1/+1
| | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): released under terms in gpl version 2 see copying extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 5 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081035.689962394@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xdp: remove redundant variable 'headroom'Colin Ian King2018-09-011-2/+1
| | | | | | | | | | | | | Variable 'headroom' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable ‘headroom’ set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* xdp: export xdp_rxq_info_unreg_mem_modelBjörn Töpel2018-08-291-2/+13
| | | | | | | | | Export __xdp_rxq_info_unreg_mem_model as xdp_rxq_info_unreg_mem_model, so it can be used from netdev drivers. Also, add additional checks for the memory type. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPYBjörn Töpel2018-08-291-0/+39
| | | | | | | | | | | | | This commit adds proper MEM_TYPE_ZERO_COPY support for convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might make sense to implement a more sophisticated thread-safe alloc/free scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is required in the fast-path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* net/xdp: Fix suspicious RCU usage warningTariq Toukan2018-08-161-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the warning below by calling rhashtable_lookup_fast. Also, make some code movements for better quality and human readability. [ 342.450870] WARNING: suspicious RCU usage [ 342.455856] 4.18.0-rc2+ #17 Tainted: G O [ 342.462210] ----------------------------- [ 342.467202] ./include/linux/rhashtable.h:481 suspicious rcu_dereference_check() usage! [ 342.476568] [ 342.476568] other info that might help us debug this: [ 342.476568] [ 342.486978] [ 342.486978] rcu_scheduler_active = 2, debug_locks = 1 [ 342.495211] 4 locks held by modprobe/3934: [ 342.500265] #0: 00000000e23116b2 (mlx5_intf_mutex){+.+.}, at: mlx5_unregister_interface+0x18/0x90 [mlx5_core] [ 342.511953] #1: 00000000ca16db96 (rtnl_mutex){+.+.}, at: unregister_netdev+0xe/0x20 [ 342.521109] #2: 00000000a46e2c4b (&priv->state_lock){+.+.}, at: mlx5e_close+0x29/0x60 [mlx5_core] [ 342.531642] #3: 0000000060c5bde3 (mem_id_lock){+.+.}, at: xdp_rxq_info_unreg+0x93/0x6b0 [ 342.541206] [ 342.541206] stack backtrace: [ 342.547075] CPU: 12 PID: 3934 Comm: modprobe Tainted: G O 4.18.0-rc2+ #17 [ 342.556621] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015 [ 342.565606] Call Trace: [ 342.568861] dump_stack+0x78/0xb3 [ 342.573086] xdp_rxq_info_unreg+0x3f5/0x6b0 [ 342.578285] ? __call_rcu+0x220/0x300 [ 342.582911] mlx5e_free_rq+0x38/0xc0 [mlx5_core] [ 342.588602] mlx5e_close_channel+0x20/0x120 [mlx5_core] [ 342.594976] mlx5e_close_channels+0x26/0x40 [mlx5_core] [ 342.601345] mlx5e_close_locked+0x44/0x50 [mlx5_core] [ 342.607519] mlx5e_close+0x42/0x60 [mlx5_core] [ 342.613005] __dev_close_many+0xb1/0x120 [ 342.617911] dev_close_many+0xa2/0x170 [ 342.622622] rollback_registered_many+0x148/0x460 [ 342.628401] ? __lock_acquire+0x48d/0x11b0 [ 342.633498] ? unregister_netdev+0xe/0x20 [ 342.638495] rollback_registered+0x56/0x90 [ 342.643588] unregister_netdevice_queue+0x7e/0x100 [ 342.649461] unregister_netdev+0x18/0x20 [ 342.654362] mlx5e_remove+0x2a/0x50 [mlx5_core] [ 342.659944] mlx5_remove_device+0xe5/0x110 [mlx5_core] [ 342.666208] mlx5_unregister_interface+0x39/0x90 [mlx5_core] [ 342.673038] cleanup+0x5/0xbfc [mlx5_core] [ 342.678094] __x64_sys_delete_module+0x16b/0x240 [ 342.683725] ? do_syscall_64+0x1c/0x210 [ 342.688476] do_syscall_64+0x5a/0x210 [ 342.693025] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 8d5d88527587 ("xdp: rhashtable with allocator ID to pointer mapping") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* xdp: Helpers for disabling napi_direct of xdp_return_frameToshiaki Makita2018-08-101-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need some mechanism to disable napi_direct on calling xdp_return_frame_rx_napi() from some context. When veth gets support of XDP_REDIRECT, it will redirects packets which are redirected from other devices. On redirection veth will reuse xdp_mem_info of the redirection source device to make return_frame work. But in this case .ndo_xdp_xmit() called from veth redirection uses xdp_mem_info which is not guarded by NAPI, because the .ndo_xdp_xmit() is not called directly from the rxq which owns the xdp_mem_info. This approach introduces a flag in bpf_redirect_info to indicate that napi_direct should be disabled even when _rx_napi variant is used as well as helper functions to use it. A NAPI handler who wants to use this flag needs to call xdp_set_return_frame_no_direct() before processing packets, and call xdp_clear_return_frame_no_direct() after xdp_do_flush_map() before exiting NAPI. v4: - Use bpf_redirect_info for storing the flag instead of xdp_mem_info to avoid per-frame copy cost. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* Revert "xdp: add NULL pointer check in __xdp_return()"Björn Töpel2018-08-101-2/+1
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 36e0f12bbfd3016f495904b35e41c5711707509f. The reverted commit adds a WARN to check against NULL entries in the mem_id_ht rhashtable. Any kernel path implementing the XDP (generic or driver) fast path is required to make a paired xdp_rxq_info_reg/xdp_rxq_info_unreg call for proper function. In addition, a driver using a different allocation scheme than the default MEM_TYPE_PAGE_SHARED is required to additionally call xdp_rxq_info_reg_mem_model. For MEM_TYPE_ZERO_COPY, an xdp_rxq_info_reg_mem_model call ensures that the mem_id_ht rhashtable has a properly inserted allocator id. If not, this would be a driver bug. A NULL pointer kernel OOPS is preferred to the WARN. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-08-021-1/+2
|\ | | | | | | | | | | | | | | | | | | The BTF conflicts were simple overlapping changes. The virtio_net conflict was an overlap of a fix of statistics counter, happening alongisde a move over to a bonafide statistics structure rather than counting value on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
| * xdp: add NULL pointer check in __xdp_return()Taehee Yoo2018-07-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | rhashtable_lookup() can return NULL. so that NULL pointer check routine should be added. Fixes: 02b55e5657c3 ("xdp: add MEM_TYPE_ZERO_COPY") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | xdp: factor out common program/flags handling from driversJakub Kicinski2018-07-131-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Basic operations drivers perform during xdp setup and query can be moved to helpers in the core. Encapsulate program and flags into a structure and add helpers. Note that the structure is intended as the "main" program information source in the driver. Most drivers will additionally place the program pointer in their fast path or ring structures. The helpers don't have a huge impact now, but they will decrease the code duplication when programs can be installed in HW and driver at the same time. Encapsulating the basic operations in helpers will hopefully also reduce the number of changes to drivers which adopt them. Helpers could really be static inline, but they depend on definition of struct netdev_bpf which means they'd have to be placed in netdevice.h, an already 4500 line header. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | rhashtable: remove nulls_base and related code.NeilBrown2018-06-221-2/+2
|/ | | | | | | | | | | | | | | | | | This "feature" is unused, undocumented, and untested and so doesn't really belong. A patch is under development to properly implement support for detecting when a search gets diverted down a different chain, which the common purpose of nulls markers. This patch actually fixes a bug too. The table resizing allows a table to grow to 2^31 buckets, but the hash is truncated to 27 bits - any growth beyond 2^27 is wasteful an ineffective. This patch results in NULLS_MARKER(0) being used for all chains, and leaves the use of rht_is_a_null() to test for it. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: add MEM_TYPE_ZERO_COPYBjörn Töpel2018-06-051-5/+14
| | | | | | | | | | | | | Here, a new type of allocator support is added to the XDP return API. A zero-copy allocated xdp_buff cannot be converted to an xdp_frame. Instead is the buff has to be copied. This is not supported at all in this commit. Also, an opaque "handle" is added to xdp_buff. This can be used as a context for the zero-copy allocator implementation. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* xdp: introduce xdp_return_frame_rx_napiJesper Dangaard Brouer2018-05-241-4/+16
| | | | | | | | | | | | | | | | | | When sending an xdp_frame through xdp_do_redirect call, then error cases can happen where the xdp_frame needs to be dropped, and returning an -errno code isn't sufficient/possible any-longer (e.g. for cpumap case). This is already fully supported, by simply calling xdp_return_frame. This patch is an optimization, which provides xdp_return_frame_rx_napi, which is a faster variant for these error cases. It take advantage of the protection provided by XDP RX running under NAPI protection. This change is mostly relevant for drivers using the page_pool allocator as it can take advantage of this. (Tested with mlx5). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* xsk: add Rx receive functions and poll supportBjörn Töpel2018-05-031-3/+12
| | | | | | | | | | | | | | | | | | | | | | | Here the actual receive functions of AF_XDP are implemented, that in a later commit, will be called from the XDP layers. There's one set of functions for the XDP_DRV side and another for XDP_SKB (generic). A new XDP API, xdp_return_buff, is also introduced. Adding xdp_return_buff, which is analogous to xdp_return_frame, but acts upon an struct xdp_buff. The API will be used by AF_XDP in future commits. Support for the poll syscall is also implemented. v2: xskq_validate_id did not update cons_tail. The entries variable was calculated twice in xskq_nb_avail. Squashed xdp_return_buff commit. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* xdp: transition into using xdp_frame for return APIJesper Dangaard Brouer2018-04-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") V8: Adjust for commit bd658dda4237 ("net/mlx5e: Separate dma base address and offset in dma_sync call") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: allow page_pool as an allocator type in xdp_return_frameJesper Dangaard Brouer2018-04-171-12/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | New allocator type MEM_TYPE_PAGE_POOL for page_pool usage. The registered allocator page_pool pointer is not available directly from xdp_rxq_info, but it could be (if needed). For now, the driver should keep separate track of the page_pool pointer, which it should use for RX-ring page allocation. As suggested by Saeed, to maintain a symmetric API it is the drivers responsibility to allocate/create and free/destroy the page_pool. Thus, after the driver have called xdp_rxq_info_unreg(), it is drivers responsibility to free the page_pool, but with a RCU free call. This is done easily via the page_pool helper page_pool_destroy() (which avoids touching any driver code during the RCU callback, which could happen after the driver have been unloaded). V8: address issues found by kbuild test robot - Address sparse should be static warnings - Allow xdp.o to be compiled without page_pool.o V9: Remove inline from .c file, compiler knows best Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: rhashtable with allocator ID to pointer mappingJesper Dangaard Brouer2018-04-171-5/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which uses a radix tree, use a dynamic rhashtable, for creating ID to pointer lookup table, because this is faster. The problem that is being solved here is that, the xdp_rxq_info pointer (stored in xdp_buff) cannot be used directly, as the guaranteed lifetime is too short. The info is needed on a (potentially) remote CPU during DMA-TX completion time . In an xdp_frame the xdp_mem_info is stored, when it got converted from an xdp_buff, which is sufficient for the simple page refcnt based recycle schemes. For more advanced allocators there is a need to store a pointer to the registered allocator. Thus, there is a need to guard the lifetime or validity of the allocator pointer, which is done through this rhashtable ID map to pointer. The removal and validity of of the allocator and helper struct xdp_mem_allocator is guarded by RCU. The allocator will be created by the driver, and registered with xdp_rxq_info_reg_mem_model(). It is up-to debate who is responsible for freeing the allocator pointer or invoking the allocator destructor function. In any case, this must happen via RCU freeing. Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. V4: Per req of Jason Wang - Use xdp_rxq_info_reg_mem_model() in all drivers implementing XDP_REDIRECT, even-though it's not strictly necessary when allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero). V6: Per req of Alex Duyck - Introduce rhashtable_lookup() call in later patch V8: Address sparse should be static warnings (from kbuild test robot) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp: introduce xdp_return_frame API and use in cpumapJesper Dangaard Brouer2018-04-171-0/+18
| | | | | | | | | | | | Introduce an xdp_return_frame API, and convert over cpumap as the first user, given it have queued XDP frame structure to leverage. V3: Cleanup and remove C99 style comments, pointed out by Alex Duyck. V6: Remove comment that id will be added later (Req by Alex Duyck) V8: Rename enum mem_type to xdp_mem_type (found by kbuild test robot) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_regJesper Dangaard Brouer2018-01-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | The driver code qede_free_fp_array() depend on kfree() can be called with a NULL pointer. This stems from the qede_alloc_fp_array() function which either (kz)alloc memory for fp->txq or fp->rxq. This also simplifies error handling code in case of memory allocation failures, but xdp_rxq_info_unreg need to know the difference. Introduce xdp_rxq_info_is_reg() to handle if a memory allocation fails and detect this is the failure path by seeing that xdp_rxq_info was not registred yet, which first happens after successful alloaction in qede_init_fp(). Driver hook points for xdp_rxq_info: * reg : qede_init_fp * unreg: qede_free_fp_array Tested on actual hardware with samples/bpf program. V2: Driver have no proper error path for failed XDP RX-queue info reg, as qede_init_fp() is a void function. Cc: everest-linux-l2@cavium.com Cc: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* xdp: base API for new XDP rx-queue info conceptJesper Dangaard Brouer2018-01-051-0/+67
This patch only introduce the core data structures and API functions. All XDP enabled drivers must use the API before this info can used. There is a need for XDP to know more about the RX-queue a given XDP frames have arrived on. For both the XDP bpf-prog and kernel side. Instead of extending xdp_buff each time new info is needed, the patch creates a separate read-mostly struct xdp_rxq_info, that contains this info. We stress this data/cache-line is for read-only info. This is NOT for dynamic per packet info, use the data_meta for such use-cases. The performance advantage is this info can be setup at RX-ring init time, instead of updating N-members in xdp_buff. A possible (driver level) micro optimization is that xdp_buff->rxq assignment could be done once per XDP/NAPI loop. The extra pointer deref only happens for program needing access to this info (thus, no slowdown to existing use-cases). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>