linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	xsk: remove unnecessary assignment	Prashant Bhole	2018-09-01	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Since xdp_umem_query() was added one assignment of bpf.command was missed from cleanup. Removing the assignment statement. Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN sample program	Nikita V. Shirokov	2018-09-01	2	-0/+88
\| \| \| \| \| \| \| \| \| \|	Sample program which shows TCP_SAVE_SYN/TCP_SAVED_SYN usage example: bpf program which is doing TOS/TCLASS reflection (server would reply with a same TOS/TCLASS as client). Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set\|get)sockopt	Nikita V. Shirokov	2018-09-01	1	-4/+21
\| \| \| \| \| \| \| \| \| \| \| \|	Adding support for two new bpf get/set sockopts: TCP_SAVE_SYN (set) and TCP_SAVED_SYN (get). This would allow for bpf program to build logic based on data from ingress SYN packet (e.g. doing tcp's tos/ tclass reflection (see sample prog)) and do it transparently from userspace program point of view. Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	xdp: remove redundant variable 'headroom'	Colin Ian King	2018-09-01	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Variable 'headroom' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable ‘headroom’ set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	xsk: include XDP meta data in AF_XDP frames	Björn Töpel	2018-08-30	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did not include XDP meta data in the data buffers copied out to the user application. In this commit, we check if meta data is available, and if so, it is prepended to the frame. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	Merge branch 'bpf-bpffs-bpftool-dump-with-btf'	Daniel Borkmann	2018-08-30	4	-37/+230
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Yonghong Song says: ==================== Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") and Commit 699c86d6ec21 ("bpf: btf: add pretty print for hash/lru_hash maps") added bpffs pretty print for array, hash and lru hash maps. The pretty print gives users a structurally formatted dump for keys/values which much easy to understand than raw bytes. This patch set implemented bpffs pretty print support for percpu arraymap, percpu hashmap and percpu lru hashmap. For complex key/value types, the pretty print here is even more useful due to: . large volumne of data making it even harder to correlate bytes to a particular field in a particular cpu. . kernel rounds the value size for each cpu to multiple of 8. User has to be aware of this otherwise wrong value may be derived from cpu 1/2/... For example, we may have a bpffs pretty print like below: 43602: { cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} } for a percpu map. This patch also added percpu formatted print on bpftool. For example, bpftool may print like below: { "key": 0, "values": [{ "cpu": 0, "value": { "ui32": 0, "ui16": 0, } },{ "cpu": 1, "value": { "ui32": 1, "ui16": 0, } },{ "cpu": 2, "value": { "ui32": 2, "ui16": 0, } },{ "cpu": 3, "value": { "ui32": 3, "ui16": 0, } } ] } Patch #1 implemented bpffs pretty print for percpu arraymap/hash/lru_hash in kernel. Patch #2 added the test case in tools bpf selftest test_btf. Patch #3 added percpu map btf based dump. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	tools/bpf: bpftool: add btf percpu map formated dump	Yonghong Song	2018-08-30	1	-2/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The btf pretty print is added to percpu arraymap, percpu hashmap and percpu lru hashmap. For each <key, value> pair, the following will be added to plain/json output: { "key": <pretty_print_key>, "values": [{ "cpu": 0, "value": <pretty_print_value_on_cpu0> },{ "cpu": 1, "value": <pretty_print_value_on_cpu1> },{ .... },{ "cpu": n, "value": <pretty_print_value_on_cpun> } ] } For example, the following could be part of plain or json formatted output: { "key": 0, "values": [{ "cpu": 0, "value": { "ui32": 0, "ui16": 0, } },{ "cpu": 1, "value": { "ui32": 1, "ui16": 0, } },{ "cpu": 2, "value": { "ui32": 2, "ui16": 0, } },{ "cpu": 3, "value": { "ui32": 3, "ui16": 0, } } ] } Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	tools/bpf: add bpffs percpu map pretty print tests in test_btf	Yonghong Song	2018-08-30	1	-35/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bpf selftest test_btf is extended to test bpffs percpu map pretty print for percpu array, percpu hash and percpu lru hash. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
\| *	bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash	Yonghong Song	2018-08-30	2	-0/+55
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added bpffs pretty print for percpu arraymap, percpu hashmap and percpu lru hashmap. For each map <key, value> pair, the format is: <key_value>: { cpu0: <value_on_cpu0> cpu1: <value_on_cpu1> ... cpun: <value_on_cpun> } For example, on my VM, there are 4 cpus, and for test_btf test in the next patch: cat /sys/fs/bpf/pprint_test_percpu_hash You may get: ... 43602: { cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} } 72847: { cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} } ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	Merge branch 'verifier-liveness-simplification'	Alexei Starovoitov	2018-08-29	2	-152/+72
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Edward Cree says: ==================== The first patch is a simplification of register liveness tracking by using a separate parentage chain for each register and stack slot, thus avoiding the need for logic to handle callee-saved registers when applying read marks. In the future this idea may be extended to form use-def chains. The second patch adds information about misc/zero data on the stack to the state dumps emitted to the log at various points; this information was found essential in debugging the first patch, and may be useful elsewhere. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	bpf/verifier: display non-spill stack slot types in print_verifier_state	Edward Cree	2018-08-29	1	-7/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a stack slot does not hold a spilled register (STACK_SPILL), then each of its eight bytes could potentially have a different slot_type. This information can be important for debugging, and previously we either did not print anything for the stack slot, or just printed fp-X=0 in the case where its first byte was STACK_ZERO. Instead, print eight characters with either 0 (STACK_ZERO), m (STACK_MISC) or ? (STACK_INVALID) for any stack slot which is neither STACK_SPILL nor entirely STACK_INVALID. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	bpf/verifier: per-register parent pointers	Edward Cree	2018-08-29	2	-145/+47
\|/ \| \| \| \| \| \| \| \| \| \| \|	By giving each register its own liveness chain, we elide the skip_callee() logic. Instead, each register's parent is the state it inherits from; both check_func_call() and prepare_func_exit() automatically connect reg states to the correct chain since when they copy the reg state across (r1-r5 into the callee as args, and r0 out as the return value) they also copy the parent pointer. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
*	Merge branch 'AF_XDP-zerocopy-for-i40e'	Alexei Starovoitov	2018-08-29	14	-108/+1523
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Björn Töpel says: ==================== This patch set introduces zero-copy AF_XDP support for Intel's i40e driver. In the first preparatory patch we also add support for XDP_REDIRECT for zero-copy allocated frames so that XDP programs can redirect them. This was a ToDo from the first AF_XDP zero-copy patch set from early June. Special thanks to Alex Duyck and Jesper Dangaard Brouer for reviewing earlier versions of this patch set. The i40e zero-copy code is located in its own file i40e_xsk.[ch]. Note that in the interest of time, to get an AF_XDP zero-copy implementation out there for people to try, some code paths have been copied from the XDP path to the zero-copy path. It is out goal to merge the two paths in later patch sets. In contrast to the implementation from beginning of June, this patch set does not require any extra HW queues for AF_XDP zero-copy TX. Instead, the XDP TX HW queue is used for both XDP_REDIRECT and AF_XDP zero-copy TX. Jeff, given that most of changes are in i40e, it is up to you how you would like to route these patches. The set is tagged bpf-next, but if taking it via the Intel driver tree is easier, let us know. We have run some benchmarks on a dual socket system with two Broadwell E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14 cores which gives a total of 28, but only two cores are used in these experiments. One for TR/RX and one for the user space application. The memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is 8192MB and with 8 of those DIMMs in the system we have 64 GB of total memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The NIC is Intel I40E 40Gbit/s using the i40e driver. Below are the results in Mpps of the I40E NIC benchmark runs for 64 and 1500 byte packets, generated by a commercial packet generator HW outputing packets at full 40 Gbit/s line rate. The results are with retpoline and all other spectre and meltdown fixes, so these results are not comparable to the ones from the zero-copy patch set in June. AF_XDP performance 64 byte packets. Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy rxdrop 2.6 8.2 15.0 txpush 2.2 - 21.9 l2fwd 1.7 2.3 11.3 AF_XDP performance 1500 byte packets: Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy rxdrop 2.0 3.3 3.3 l2fwd 1.3 1.7 3.1 XDP performance on our system as a base line: 64 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 18.4M 0 1500 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 3.3M 0 The structure of the patch set is as follows: Patch 1: Add support for XDP_REDIRECT of zero-copy allocated frames Patches 2-4: Preparatory patches to common xsk and net code Patches 5-7: Preparatory patches to i40e driver code for RX Patch 8: i40e zero-copy support for RX Patch 9: Preparatory patch to i40e driver code for TX Patch 10: i40e zero-copy support for TX Patch 11: Add flags to sample application to force zero-copy/copy mode ==================== Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	samples/bpf: add -c/--copy -z/--zero-copy flags to xdpsock	Björn Töpel	2018-08-29	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The -c/--copy -z/--zero-copy flags enforces either copy or zero-copy mode. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	i40e: add AF_XDP zero-copy Tx support	Magnus Karlsson	2018-08-29	4	-1/+186
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds zero-copy Tx support for AF_XDP sockets. It implements the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a NAPI context. This means pulling egress packets from the Tx ring, placing the frames on the NIC HW descriptor ring and completing sent frames back to the application via the completion ring. The regular XDP Tx ring is used for AF_XDP as well. This rationale for this is as follows: XDP_REDIRECT guarantees mutual exclusion between different NAPI contexts based on CPU id. In other words, a netdev can XDP_REDIRECT to another netdev with a different NAPI context, since the operation is bound to a specific core and each core has its own hardware ring. As the AF_XDP Tx action is running in the same NAPI context and using the same ring, it will also be protected from XDP_REDIRECT actions with the exact same mechanism. As with AF_XDP Rx, all AF_XDP Tx specific functions are added to i40e_xsk.c. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	i40e: move common Tx functions to i40e_txrx_common.h	Magnus Karlsson	2018-08-29	2	-33/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch prepares for the upcoming zero-copy Tx functionality, by moving common functions and refactor chunks of code into re-usable functions, used both by the regular path and zero-copy path. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	i40e: add AF_XDP zero-copy Rx support	Björn Töpel	2018-08-29	7	-11/+775
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds zero-copy Rx support for AF_XDP sockets. Instead of allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain queue. All AF_XDP specific functions are added to a new file, i40e_xsk.c. Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS will allocate a new buffer and copy the zero-copy frame prior passing it to the kernel stack. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	i40e: move common Rx functions to i40e_txrx_common.h	Björn Töpel	2018-08-29	2	-20/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch prepares for the upcoming zero-copy Rx functionality, by moving/changing linkage of common functions, used both by the regular path and zero-copy path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	i40e: refactor Rx path for re-use	Björn Töpel	2018-08-29	1	-34/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In this commit, the Rx path is refactored some, as a step torwards the introduction AF_XDP Rx zero-copy. The page re-use counter is moved into the i40e_reuse_rx_page, instead of bumping the counter in many places. The Rx buffer page clearing is moved for better readability. Lastely, functions to update statistics and bump the XDP Tx ring are introduced. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	i40e: added queue pair disable/enable functions	Björn Töpel	2018-08-29	1	-0/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add functions for queue pair enable/disable. Instead of resetting the whole device, only the affected queue pair is disabled or enabled. This plumbing is used in a later commit, when zero-copy AF_XDP support is introduced. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	net: add napi_if_scheduled_mark_missed	Magnus Karlsson	2018-08-29	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function napi_if_scheduled_mark_missed is used to check if the NAPI context is scheduled, if so set NAPIF_STATE_MISSED and return true. Used by the AF_XDP zero-copy i40e Tx code implementation in order to make sure that irq affinity is honored by the napi context. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	xsk: expose xdp_umem_get_{data,dma} to drivers	Björn Töpel	2018-08-29	2	-10/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the xdp_umem_get_{data,dma} functions to include/net/xdp_sock.h, so that the upcoming zero-copy implementation in the Ethernet drivers can utilize them. Also, supply some dummy function implementations for CONFIG_XDP_SOCKETS=n configs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	xdp: export xdp_rxq_info_unreg_mem_model	Björn Töpel	2018-08-29	2	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Export __xdp_rxq_info_unreg_mem_model as xdp_rxq_info_unreg_mem_model, so it can be used from netdev drivers. Also, add additional checks for the memory type. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
\| *	xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPY	Björn Töpel	2018-08-29	2	-2/+42
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds proper MEM_TYPE_ZERO_COPY support for convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might make sense to implement a more sophisticated thread-safe alloc/free scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is required in the fast-path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
*	bpf: use --cgroup in test_suite if supplied	John Fastabend	2018-08-29	1	-22/+31
\| \| \| \| \| \| \| \| \|	If the user supplies a --cgroup value in the arguments when running the test_suite go ahaead and run the self tests there. I use this to test with multiple cgroup users. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	bpf: sockmap test remove shutdown() calls	John Fastabend	2018-08-29	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we do a shutdown(sk, SHUT_RDWR) on both peer sockets and a shutdown on the sender as well. However, this is incorrect and can occasionally cause issues if you happen to have bad timing. First peer1 or peer2 may still be in use depending on the test and timing. Second we really should only be closing the read side and/or write side depending on if the test is receiving or sending. But, really none of this is needed just remove the shutdown calls. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	bpf: remove duplicated include from syscall.c	YueHaibing	2018-08-29	1	-1/+0
\| \| \| \| \| \| \|	Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	Merge branch 'nfp-add-NFP5000-support'	David S. Miller	2018-08-28	15	-236/+682
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jakub Kicinski says: ==================== nfp: add NFP5000 support This series broadly speaking adds support for NFP5000 and related products. First we add support for loading FW from flash. We need to allow for the management processor to provide extended log messages when FW is loaded. This is needed when FW selection policy is to compare the FW on the disk and in the flash, and load the newer. User should be told what FW was selected. We use this opportunity to add extended errors for normal FW loading as well. Next we add support for requesting HW information from the management processor. Up until now the driver read the HWinfo as it appears in card memory, but there can be cases when management processor has additional information or generates the entries dynamically so occasionally we will have to consult it. We use this to look up MAC addresses for PCIe netdevs. Next the actual patch with NFP5000 support and a small dose of refactoring of PCIe init. The remaining patches add support for reading RTsymbol types we didn't need before. Ones explicitly placed in external memory unit's cache and absolute ones. This part begins with a patch moving the logic which figures out the correct bit offsets to device probe, to avoid redoing the calculation for each access. Second patch adds error messages for easier troubleshooting. Next patch adds helpers which will take care of address conversions to reach into EMU cache. Subsequently users are migrated from the raw CPP API to the new RTsym helpers. Finally we add support for reading absolute symbols. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: make RTsym users handle absolute symbols correctly	Jakub Kicinski	2018-08-28	3	-32/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the RTsym users access the size via the helper, which takes care of special handling of absolute symbols. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: support access to absolute RTsyms	Jakub Kicinski	2018-08-28	2	-9/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support in nfpcore for reading the absolute RTsyms. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: convert all RTsym users to use new read/write helpers	Jakub Kicinski	2018-08-28	3	-46/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert all users of RTsym to the new set of helpers which handle all targets correctly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: convert existing RTsym helpers to full target decoding	Jakub Kicinski	2018-08-28	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make nfp_rtsym_{read,write}_le() and nfp_rtsym_map() use the new target resolution helpers to allow accessing in-cache symbols. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: pass cpp_id to nfp_cpp_map_area()	Jakub Kicinski	2018-08-28	4	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Align nfp_cpp_map_area() with other CPP-level APIs and pass encoded cpp_id/dest rather than target, action, domain tuple. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: add RTsym access helpers	Jakub Kicinski	2018-08-28	2	-0/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RTsyms may have special encodings for more complex symbol types. For example symbols which are placed in external memory unit's cache directly, constants or local memory. Add set of helpers which will check for those special encodings and handle them correctly. For now only add direct cache accesses, we don't have a need to access the other ones in foreseeable future. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: add basic errors messages to target logic	Jakub Kicinski	2018-08-28	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add error prints to CPP target encoding/decoding logic, otherwise it's quite hard to pin point the reasons why read or write operations fail. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: save the MU locality field offset	Jakub Kicinski	2018-08-28	3	-31/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We will soon need the MU locality field offset much more often than just for decoding MIP address. Save it in nfp_cpp for quick access. Note that we can already reuse the target config from nfp_cpp, no need to do the XPB read. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: refactor the per-chip PCIe config	Jakub Kicinski	2018-08-28	2	-13/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a switch statement instead of ifs for code dependent on chip version. While at it make sure we fail for unknown chip revisions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: add support for NFP5000	Jakub Kicinski	2018-08-28	2	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add NFP5000 to supported chips, the chip is backward compatible with NFP4000 and NFP6000, so core PCIe code needs to handle it the same way as 4k and 6k. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: abm: look up MAC addresses via management FW	Jakub Kicinski	2018-08-28	1	-9/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In multi-host scenarios Management FW may allocate MAC addresses at runtime, we have to use the indirect lookup to find them. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: add support for indirect HWinfo lookup	Jakub Kicinski	2018-08-28	2	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Management FW can adjust some of the information in the HWinfo table at runtime. In some cases reading the table directly will not yield correct results. Add a NSP command for looking up information. Up until now we weren't making use of any of the values which may get adjusted. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: interpret extended FW load result codes	Jakub Kicinski	2018-08-28	2	-3/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To enable easier FW distribution NFP can now automatically select between FW stored on the flash and loaded from the kernel. If FW loading policy is set to auto it will compare the versions of FW from the host and from the flash and load the newer one. If FW type doesn't match (e.g. one advanced application vs another) the FW from the host takes precedence, unless one of them is the basic NIC firmware, in which case the non-basic-NIC FW is selected. This automatic selection mechanism requires we inform user what the verdict was. Print a message to the logs explaining the decision and the reason. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: attempt FW load from flash	Jakub Kicinski	2018-08-28	3	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flash may contain a default NFP application FW. This application can either be put there by the user (with ethtool -f) or shipped with the card. If file system FW is not found, attempt to load this flash stored app FW. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp: encapsulate NSP command arguments into structs	Jakub Kicinski	2018-08-28	1	-69/+136
\|/ \| \| \| \| \| \| \| \| \|	There is already a fair number of arguments to nfp_nsp_command() family of functions. Encapsulate them into structures to make adding new ones easier. No functional changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch '100GbE' of ↵	David S. Miller	2018-08-28	17	-1199/+3326
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2018-08-28 This series contains new features and implementation updates for the ice driver. Anirudh reworks the current flex programming logic to add support for a second flex descriptor profile. Updated the transmit scheduler code to handle changes to the spec, specifically the firmware expects a 4KB buffer at all times so fix the default scheduler topology buffer size. Also the maximum children per node per layer is replaced by maximum sibling group size. Adds a check to ensure a reset is not in progress before exercising a control queue operation. Refactored the switch rule management functions and structures to simply the logic and to add a common function to search for a rule entry and add a new rule entry. Refactored the VSI allocation, deletion and rebuild flow so that on reset we can restore all the filters that were previously added. Did some spring cleaning of define names and macros. Dan updates the admin queue command for requesting resource ownership to the latest specification by adding new enum's and change the locks. Zhenning optimizes the driver by using the existing buffer in a structure directly versus a local array. Chinh implements handlers for ethtool for get and set link settings. Sudheer implements transmit hang/timeout detection and malicious driver detection in the driver. Md Fahad Iqbal implements the get and set bridge mode operations. Hieu adds the ability for firmware logging during initialization. Brett updates the driver to only enable VSI transmit and receive pruning when VLAN 0 is active, and when VLAN 0 is removed/not active, pruning is disabled. Akeem adds a flag to use for stopping the service task. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ice: Fix and update driver version string	Anirudh Venkataramanan	2018-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the "ice" prefix for the driver version string and bump version to 0.7.1-k. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ice: Introduce SERVICE_DIS flag and service routine functions	Akeem G Abodunrin	2018-08-28	2	-7/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces SERVICE_DIS flag to use for stopping service task. This flag will be checked before scheduling new tasks. Also add new functions ice_service_task_stop to stop service task. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ice: Enable VSI Rx/Tx pruning only when VLAN 0 is active	Brett Creeley	2018-08-28	1	-9/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VLAN pruning is not valid when VLAN 0 is not active. If VLAN pruning is enabled and VLAN 0 is not active (8021q driver not loaded) then normal, non-VLAN, traffic will not pass. TX/RX VLAN pruning is enabled when the VLAN 0 is added to the active_vlan bitmap and it is disabled when VLAN 0 is removed from the active_vlan bitmap. So, only enable VLAN pruning when VLAN 0 is active. Setting RX VLAN pruning causes the switch to drop received VLAN packets when there are no matching VLAN ids in the associated VSI's switch filters. Setting TX pruning makes it so the switch will not send out any packets with VLAN tags that don't match the associated VSI's switch filters. With this patch, if the VF or PF tries to send a VLAN tagged packet with a VLAN tag that it does not have a pruning rule for it will trigger an MDD event. For example, if PF0 has VLAN10 and VLAN11 interfaces and scapy is used to send a packet with VLAN8 then the MDD is triggered. Also make ice_vsi_kill_vlan return a value which the caller can check before updating VLAN related data structures (counts, pruning bits, etc.). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ice: Enable firmware logging during device initialization.	Hieu Tran	2018-08-28	5	-2/+286
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To enable FW logging, the "cq_en" and "uart_en" enable bits of the "fw_log" element in struct ice_hw need to set accordingly based on some user-provided parameters during driver loading. To select which FW log events to be emitted, the "cfg" elements of corresponding FW modules in the "evnts" array member of "fw_log" need to be configured. Signed-off-by: Hieu Tran <hieu.t.tran@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ice: Implement ice_bridge_getlink and ice_bridge_setlink	Md Fahad Iqbal Polash	2018-08-28	3	-1/+181
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ice_bridge_getlink returns the current bridge mode using ndo_dflt_bridge_getlink and the mode parameter available in first_switch->bridge_mode. ice_bridge_setlink is invoked when the bridge mode needs to changed. The value to be changed to is available as a netlink message which is parsed in this function. If the mode has to be changed, switch_flags is set appropriately (set ALLOW_LB for VEB mode and clear it for VEPA mode) and ice_aq_update_vsi is called. Also change the unicast switch filter rules. Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ice: Add support for Tx hang, Tx timeout and malicious driver detection	Sudheer Mogilappagari	2018-08-28	5	-0/+331
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a malicious operation is detected, the firmware triggers an interrupt, which is then picked up by the service task (specifically by ice_handle_mdd_event). A reset is scheduled if required. Tx hang detection works in a similar way, except the logic here monitors the VSI's Tx queues and tries to revive them if stalled. If the hang is not resolved, the kernel eventually calls ndo_tx_timeout, which is handled by ice_tx_timeout. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>