summaryrefslogtreecommitdiffstats
path: root/tools
Commit message (Collapse)AuthorAgeFilesLines
* selftests: forwarding: Fix ping failure due to short timeoutIdo Schimmel2024-03-262-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e4137851d4863a9bdc6aabc613bcb46c06d91e64 ] The tests send 100 pings in 0.1 second intervals and force a timeout of 11 seconds, which is borderline (especially on debug kernels), resulting in random failures in netdev CI [1]. Fix by increasing the timeout to 20 seconds. It should not prolong the test unless something is wrong, in which case the test will rightfully fail. [1] # selftests: net/forwarding: vxlan_bridge_1d_port_8472_ipv6.sh # INFO: Running tests with UDP port 8472 # TEST: ping: local->local [ OK ] # TEST: ping: local->remote 1 [FAIL] # Ping failed [...] Fixes: b07e9957f220 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6") Fixes: 728b35259e28 ("selftests: forwarding: Add VxLAN tests with a VLAN-aware bridge for IPv6") Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/netdev/24a7051fdcd1f156c3704bca39e4b3c41dfc7c4b.camel@redhat.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240320065717.4145325-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf pmu: Fix a potential memory leak in perf_pmu__lookup()Christophe JAILLET2024-03-261-4/+3
| | | | | | | | | | | | | | | | | [ Upstream commit ef5de1613d7d92bdc975e6beb34bb0fa94f34078 ] The commit in Fixes has reordered some code, but missed an error handling path. 'goto err' now, in order to avoid a memory leak in case of error. Fixes: f63a536f03a2 ("perf pmu: Merge JSON events with sysfs at load time") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ian Rogers <irogers@google.com> Cc: kernel-janitors@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/9538b2b634894c33168dfe9d848d4df31fd4d801.1693085544.git.christophe.jaillet@wanadoo.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf print-events: make is_event_supported() more robustMark Rutland2024-03-261-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 25412c0364f7110faa6053c73e3fd47ca956b8c3 ] Currently the perf tool doesn't detect support for extended event types on Apple M1/M2 systems, and will not auto-expand plain PERF_EVENT_TYPE hardware events into per-PMU events. This is due to the detection of extended event types not handling mandatory filters required by the M1/M2 PMU driver. PMU drivers and the core perf_events code can require that perf_event_attr::exclude_* filters are configured in a specific way and may reject certain configurations of filters, for example: (a) Many PMUs lack support for any event filtering, and require all perf_event_attr::exclude_* bits to be clear. This includes Alpha's CPU PMU, and ARM CPU PMUs prior to the introduction of PMUv2 in ARMv7, (b) When /proc/sys/kernel/perf_event_paranoid >= 2, the perf core requires that perf_event_attr::exclude_kernel is set. (c) The Apple M1/M2 PMU requires that perf_event_attr::exclude_guest is set as the hardware PMU does not count while a guest is running (but might be extended in future to do so). In is_event_supported(), we try to account for cases (a) and (b), first attempting to open an event without any filters, and if this fails, retrying with perf_event_attr::exclude_kernel set. We do not account for case (c), or any other filters that drivers could theoretically require to be set. Thus is_event_supported() will fail to detect support for any events targeting an Apple M1/M2 PMU, even where events would be supported with perf_event_attr:::exclude_guest set. Since commit: 82fe2e45cdb00de4 ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type") ... we use is_event_supported() to detect support for extended types, with the PMU ID encoded into the perf_event_attr::type. As above, on an Apple M1/M2 system this will always fail to detect that the event is supported, and consequently we fail to detect support for extended types even when these are supported, as they have been since commit: 5c816728651ae425 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability") Due to this, the perf tool will not automatically expand plain PERF_TYPE_HARDWARE events into per-PMU events, even when all the necessary kernel support is present. This patch updates is_event_supported() to additionally try opening events with perf_event_attr::exclude_guest set, allowing support for events to be detected on Apple M1/M2 systems. I believe that this is sufficient for all contemporary CPU PMU drivers, though in future it may be necessary to check for other combinations of filter bits. I've deliberately changed the check to not expect a specific error code for missing filters, as today ;the kernel may return a number of different error codes for missing filters (e.g. -EACCESS, -EINVAL, or -EOPNOTSUPP) depending on why and where the filter configuration is rejected, and retrying for any error is more robust. Note that this does not remove the need for commit: a24d9d9dc096fc0d ("perf parse-events: Make legacy events lower priority than sysfs/JSON") ... which is still necessary so that named-pmu/event/ events work on kernels without extended type support, even if the event name happens to be the same as a PERF_EVENT_TYPE_HARDWARE event (e.g. as is the case for the M1/M2 PMU's 'cycles' and 'instructions' events). Fixes: 82fe2e45cdb00de4 ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@arm.com> Tested-by: Marc Zyngier <maz@kernel.org> Cc: Hector Martin <marcan@marcan.st> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mike Leach <mike.leach@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240126145605.1005472-1-mark.rutland@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf metric: Don't remove scale from countsIan Rogers2024-03-261-6/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6d6be5eb45b423a37d746d3ee0fd0c78f76ead9f ] Counts were switched from the scaled saved value form to the aggregated count to avoid double accounting. When this happened the removing of scaling for a count should have been removed, however, it wasn't and this wasn't observed as it normally doesn't matter because a counter's scale is 1. A problem was observed with RAPL events that are scaled. Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather than saved_value") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-5-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf stat: Avoid metric-only segvIan Rogers2024-03-261-1/+1
| | | | | | | | | | | | | | | | | | | [ Upstream commit 2543947c77e0e224bda86b4e7220c2f6714da463 ] Cycles is recognized as part of a hard coded metric in stat-shadow.c, it may call print_metric_only with a NULL fmt string leading to a segfault. Handle the NULL fmt explicitly. Fixes: 088519f318be ("perf stat: Move the display functions to stat-display.c") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-4-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf expr: Fix "has_event" function for metric style eventsIan Rogers2024-03-261-1/+19
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 6dd76680b925228312756c13b9b983661b552a64 ] Events in metrics cannot use '/' as a separator, it would be recognized as a divide, so they use '@'. The '@' is recognized in the metricgroups code and changed to '/', do the same in the has_event function so that the parsing is only tried without the @s. Fixes: 4a4a9bf9075f ("perf expr: Add has_event function") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-3-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf srcline: Add missed addr2line closesIan Rogers2024-03-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c7ba9d18ae47924a6ea6a47ca139779f58eb83c0 ] The child_process for addr2line sets in and out to -1 so that pipes get created. It is the caller's responsibility to close the pipes, finish_command doesn't do it. Add the missed closes. Fixes: b3801e791231 ("perf srcline: Simplify addr2line subprocess") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240201001504.1348511-8-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()Yang Jihong2024-03-261-1/+1
| | | | | | | | | | | | | | [ Upstream commit 1eb3d924e3c0b8c27388b0583a989d757866efb6 ] slist needs to be freed in both error path and normal path in thread_map__new_by_tid_str(). Fixes: b52956c961be3a04 ("perf tools: Allow multiple threads or processes in record, stat, top") Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-6-yangjihong1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf bpf: Clean up the generated/copied vmlinux.hArnaldo Carvalho de Melo2024-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ffd856537b95dd65facb4e0c78ca1cb92c2048ff ] When building perf with BPF skels we either copy the minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h or use bpftool to generate a vmlinux from BTF, storing the result in $(SKEL_OUT)/vmlinux.h. We need to remove that when doing a 'make -C tools/perf clean', fix it. Fixes: b7a2d774c9c5a9a3 ("perf build: Add ability to build with a generated vmlinux.h") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/Zbz89KK5wHfZ82jv@x1 Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf evsel: Fix duplicate initialization of data->id in evsel__parse_sample()Yang Jihong2024-03-261-1/+0
| | | | | | | | | | | | | | [ Upstream commit 4962aec0d684c8edb14574ccd0da53e4926ff834 ] data->id has been initialized at line 2362, remove duplicate initialization. Fixes: 3ad31d8a0df2 ("perf evsel: Centralize perf_sample initialization") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240127025756.4041808-1-yangjihong1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf pmu: Treat the msr pmu as softwareIan Rogers2024-03-261-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 24852ef2e2d5c555c2da05baff112ea414b6e0f5 ] The msr PMU is a software one, meaning msr events may be grouped with events in a hardware context. As the msr PMU isn't marked as a software PMU by perf_pmu__is_software, groups with the msr PMU in are broken and the msr events placed in a different group. This may lead to multiplexing errors where a hardware event isn't counted while the msr event, such as tsc, is. Fix all of this by marking the msr PMU as software, which agrees with the driver. Before: ``` $ perf stat -e '{slots,tsc}' -a true WARNING: events were regrouped to match PMUs Performance counter stats for 'system wide': 1,750,335 slots 4,243,557 tsc 0.001456717 seconds time elapsed ``` After: ``` $ perf stat -e '{slots,tsc}' -a true Performance counter stats for 'system wide': 12,526,380 slots 3,415,163 tsc 0.001488360 seconds time elapsed ``` Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240124234200.1510417-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf record: Check conflict between '--timestamp-filename' option and pipe ↵Yang Jihong2024-03-262-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mode before recording [ Upstream commit 02f9b50e04812782fd006ed21c6da1c5e3e373da ] In pipe mode, no need to switch perf data output, therefore, '--timestamp-filename' option should not take effect. Check the conflict before recording and output WARNING. In this case, the check pipe mode in perf_data__switch() can be removed. Before: # perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1 # To display the perf.data header info, please use --header/--header-only options. # [ perf record: Woken up 1 times to write data ] [ perf record: Dump -.2024011812110182 ] # # Total Lost Samples: 0 # # Samples: 4K of event 'cycles:P' # Event count (approx.): 2176784359 # # Overhead Command Shared Object Symbol # ........ ....... .................... ...................................... # 97.83% perf perf [.] noploop # # (Tip: Print event counts in CSV format with: perf stat -x,) # After: # perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1 WARNING: --timestamp-filename option is not available in pipe mode. # To display the perf.data header info, please use --header/--header-only options. # [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # # Total Lost Samples: 0 # # Samples: 4K of event 'cycles:P' # Event count (approx.): 2185575421 # # Overhead Command Shared Object Symbol # ........ ....... ..................... ............................................. # 97.75% perf perf [.] noploop # # (Tip: Profiling branch (mis)predictions with: perf record -b / perf report) # Fixes: ecfd7a9c044e ("perf record: Add '--timestamp-filename' option to append timestamp to output file name") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240119040304.3708522-3-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf top: Uniform the event name for the hybrid machineKan Liang2024-03-264-27/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a61f89bf76ef6f87ec48dd90dbc73a6cf9952edc ] It's hard to distinguish the default cycles events among hybrid PMUs. For example, $ perf top Available samples 385 cycles:P 903 cycles:P The other tool, e.g., perf record, uniforms the event name and adds the hybrid PMU name before opening the event. So the events can be easily distinguished. Apply the same methodology for the perf top as well. The evlist__uniquify_name() will be invoked by both record and top. Move it to util/evlist.c With the patch: $ perf top Available samples 148 cpu_atom/cycles:P/ 1K cpu_core/cycles:P/ Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hector Martin <marcan@marcan.st> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231214144612.1092028-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Stable-dep-of: 02f9b50e0481 ("perf record: Check conflict between '--timestamp-filename' option and pipe mode before recording") Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf record: Fix possible incorrect free in record__switch_output()Yang Jihong2024-03-261-1/+1
| | | | | | | | | | | | | | | [ Upstream commit aff10a165201f6f60cff225083ce301ad3f5d8f1 ] perf_data__switch() may not assign a legal value to 'new_filename'. In this case, 'new_filename' uses the on-stack value, which may cause a incorrect free and unexpected result. Fixes: 03724b2e9c45 ("perf record: Allow to limit number of reported perf.data files") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240119040304.3708522-2-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocksJosh Poimboeuf2024-03-261-0/+12
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 10b4c4bce3f5541f54bcc2039720b11d2bc96d79 ] If SAVE and RESTORE unwind hints are in different basic blocks, and objtool sees the RESTORE before the SAVE, it errors out with: vmlinux.o: warning: objtool: vmw_port_hb_in+0x242: objtool isn't smart enough to handle this CFI save/restore combo In such a case, defer following the RESTORE block until the straight-line path gets followed later. Fixes: 8faea26e6111 ("objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402240702.zJFNmahW-lkp@intel.com/ Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20240227073527.avcm5naavbv3cj5s@treble Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: forwarding: Add missing multicast routing config entriesIdo Schimmel2024-03-261-0/+7
| | | | | | | | | | | | | | | | | [ Upstream commit f0ddf15f0a74c27eb4b2271a90e69948acc3fa2c ] The two tests that make use of multicast routig (router.sh and router_multicast.sh) are currently failing in the netdev CI because the kernel is missing multicast routing support. Fix by adding the required config entries. Fixes: 6d4efada3b82 ("selftests: forwarding: Add multicast routing test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208165538.1303021-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: forwarding: Add missing config entriesPetr Machata2024-03-261-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4acf4e62cd572b0c806035046b3698f5585ab821 ] The config file contains a partial kernel configuration to be used by `virtme-configkernel --custom'. The presumption is that the config file contains all Kconfig options needed by the selftests from the directory. In net/forwarding/config, many are missing, which manifests as spurious failures when running the selftests, with messages about unknown device types, qdisc kinds or classifier actions. Add the missing configurations. Tested the resulting configuration using virtme-ng as follows: # vng -b -f tools/testing/selftests/net/forwarding/config # vng --user root (within the VM:) # make -C tools/testing/selftests TARGETS=net/forwarding run_tests Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/025abded7ff9cea5874a7fe35dcd3fd41bf5e6ac.1706286755.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: f0ddf15f0a74 ("selftests: forwarding: Add missing multicast routing config entries") Signed-off-by: Sasha Levin <sashal@kernel.org>
* tools/resolve_btfids: Fix cross-compilation to non-host endiannessViktor Malik2024-03-261-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 903fad4394666bc23975c93fb58f137ce64b5192 ] The .BTF_ids section is pre-filled with zeroed BTF ID entries during the build and afterwards patched by resolve_btfids with correct values. Since resolve_btfids always writes in host-native endianness, it relies on libelf to do the translation when the target ELF is cross-compiled to a different endianness (this was introduced in commit 61e8aeda9398 ("bpf: Fix libelf endian handling in resolv_btfids")). Unfortunately, the translation will corrupt the flags fields of SET8 entries because these were written during vmlinux compilation and are in the correct endianness already. This will lead to numerous selftests failures such as: $ sudo ./test_verifier 502 502 #502/p sleepable fentry accept FAIL Failed to load prog 'Invalid argument'! bpf_fentry_test1 is not sleepable verification time 34 usec stack depth 0 processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Summary: 0 PASSED, 0 SKIPPED, 1 FAILED Since it's not possible to instruct libelf to translate just certain values, let's manually bswap the flags (both global and entry flags) in resolve_btfids when needed, so that libelf then translates everything correctly. Fixes: ef2c6f370a63 ("tools/resolve_btfids: Add support for 8-byte BTF sets") Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* tools/resolve_btfids: Refactor set sorting with types from btf_ids.hViktor Malik2024-03-262-14/+30
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9707ac4fe2f5bac6406d2403f8b8a64d7b3d8e43 ] Instead of using magic offsets to access BTF ID set data, leverage types from btf_ids.h (btf_id_set and btf_id_set8) which define the actual layout of the data. Thanks to this change, set sorting should also continue working if the layout changes. This requires to sync the definition of 'struct btf_id_set8' from include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync the rest of the file at the moment, b/c that would require to also sync multiple dependent headers and we don't need any other defs from btf_ids.h. Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com Stable-dep-of: 903fad439466 ("tools/resolve_btfids: Fix cross-compilation to non-host endianness") Signed-off-by: Sasha Levin <sashal@kernel.org>
* libbpf: Use OPTS_SET() macro in bpf_xdp_query()Toke Høiland-Jørgensen2024-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 92a871ab9fa59a74d013bc04f321026a057618e7 ] When the feature_flags and xdp_zc_max_segs fields were added to the libbpf bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro. This causes libbpf to write to those fields unconditionally, which means that programs compiled against an older version of libbpf (with a smaller size of the bpf_xdp_query_opts struct) will have its stack corrupted by libbpf writing out of bounds. The patch adding the feature_flags field has an early bail out if the feature_flags field is not part of the opts struct (via the OPTS_HAS) macro, but the patch adding xdp_zc_max_segs does not. For consistency, this fix just changes the assignments to both fields to use the OPTS_SET() macro. Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: trace_helpers.c: do not use poisoned typeShung-Hsi Yu2024-03-261-1/+1
| | | | | | | | | | | | | | | | | | | [ Upstream commit a68b50f47bec8bd6a33b07b7e1562db2553981a7 ] After commit c698eaebdf47 ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache") trace_helpers.c now includes libbpf_internal.h, and thus can no longer use the u32 type (among others) since they are poison in libbpf_internal.h. Replace u32 with __u32 to fix the following error when building trace_helpers.c on powerpc: error: attempt to use poisoned "u32" Fixes: c698eaebdf47 ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache") Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20240202095559.12900-1-shung-hsi.yu@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim APIAndrii Nakryiko2024-03-261-1/+1
| | | | | | | | | | | | | | | [ Upstream commit 93ee1eb85e28d1e35bb059c1f5965d65d5fc83c2 ] LIBBPF_API annotation seems missing on libbpf_set_memlock_rlim API, so add it to make this API callable from libbpf's shared library version. Fixes: e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF") Fixes: ab9a5a05dc48 ("libbpf: fix up few libbpf.map problems") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240201172027.604869-3-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Disable IPv6 for lwt_redirect testManu Bretelle2024-03-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2ef61296d2844c6a4211e07ab70ef2fb412b2c30 ] After a recent change in the vmtest runner, this test started failing sporadically. Investigation showed that this test was subject to race condition which got exacerbated after the vm runner change. The symptoms being that the logic that waited for an ICMPv4 packet is naive and will break if 5 or more non-ICMPv4 packets make it to tap0. When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6 router solicitation... On a system with good performance, the expected ICMPv4 packet would very likely make it to the network interface promptly, but on a system with poor performance, those "guarantees" do not hold true anymore. Given that the test is IPv4 only, this change disable IPv6 in the test netns by setting `net.ipv6.conf.all.disable_ipv6` to 1. This essentially leaves "ping" as the sole generator of traffic in the network namespace. If this test was to be made IPv6 compatible, the logic in `wait_for_packet` would need to be modified. In more details... At a high level, the test does: - create a new namespace - in `setup_redirect_target` set up lo, tap0, and link_err interfaces as well as add 2 routes that attaches ingress/egress sections of `test_lwt_redirect.bpf.o` to the xmit path. - in `send_and_capture_test_packets` send an ICMP packet and read off the tap interface (using `wait_for_packet`) to check that a ICMP packet with the right size is read. `wait_for_packet` will try to read `max_retry` (5) times from the tap0 fd looking for an ICMPv4 packet matching some criteria. The problem is that when we set up the `tap0` interface, because IPv6 is enabled by default, traffic such as Router solicitation is sent through tap0, as in: # tcpdump -r /tmp/lwt_redirect.pc reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet) 04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32 04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108 04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16 04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16 If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in: 2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec 2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec 2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec 2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec 2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec 2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec 2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec 2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec 2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec 2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec 2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec 2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec 2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec 2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec 2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec 2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec 2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec 2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec 2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec 2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec 2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1 2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails 2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec When running in a VM which potential resource contrains, the odds that calling `ping` is not scheduled very soon after bringing `tap0` up increases, and with this the chances to get our ICMP packet pushed to position 6+ in the network trace. To confirm this indeed solves the issue, I ran the test 100 times in a row with: errors=0 successes=0 for i in `seq 1 100` do ./test_progs -t lwt_redirect/lwt_redirect_normal if [ $? -eq 0 ]; then successes=$((successes+1)) else errors=$((errors+1)) fi done echo "successes: $successes/errors: $errors" While this test would at least fail a couple of time every 10 runs, here it ran 100 times with no error. Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT") Signed-off-by: Manu Bretelle <chantr4@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240131053212.2247527-1-chantr4@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* libbpf: Fix faccessat() usage on AndroidAndrii Nakryiko2024-03-261-0/+14
| | | | | | | | | | | | | | | | | | | | [ Upstream commit ad57654053805bf9a62602aaec74cc78edb6f235 ] Android implementation of libc errors out with -EINVAL in faccessat() if passed AT_EACCESS ([0]), this leads to ridiculous issue with libbpf refusing to load /sys/kernel/btf/vmlinux on Androids ([1]). Fix by detecting Android and redefining AT_EACCESS to 0, it's equivalent on Android. [0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50 [1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250 Fixes: 6a4ab8869d0b ("libbpf: Fix the case of running as non-root with capabilities") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240126220944.2497665-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Wait for the netstamp_needed_key static key to be turned onMartin KaFai Lau2024-03-261-4/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ce6f6cffaeaa0a3bcdafcae7fe03c68c3afae631 ] After the previous patch that speeded up the test (by avoiding neigh discovery in IPv6), the BPF CI occasionally hits this error: rcv tstamp unexpected pkt rcv tstamp: actual 0 == expected 0 The test complains about the cmsg returned from the recvmsg() does not have the rcv timestamp. Setting skb->tstamp or not is controlled by a kernel static key "netstamp_needed_key". The static key is enabled whenever this is at least one sk with the SOCK_TIMESTAMP set. The test_redirect_dtime does use setsockopt() to turn on the SOCK_TIMESTAMP for the reading sk. In the kernel net_enable_timestamp() has a delay to enable the "netstamp_needed_key" when CONFIG_JUMP_LABEL is set. This potential delay is the likely reason for packet missing rcv timestamp occasionally. This patch is to create udp sockets with SOCK_TIMESTAMP set. It sends and receives some packets until the received packet has a rcv timestamp. It currently retries at most 5 times with 1s in between. This should be enough to wait for the "netstamp_needed_key". It then holds on to the socket and only closes it at the end of the test. This guarantees that the test has the "netstamp_needed_key" key turned on from the beginning. To simplify the udp sockets setup, they are sending/receiving packets in the same netns (ns_dst is used) and communicate over the "lo" dev. Hence, the patch enables the "lo" dev in the ns_dst. Fixes: c803475fd8dd ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240120060518.3604920-2-martin.lau@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Fix the flaky tc_redirect_dtime testMartin KaFai Lau2024-03-261-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 177f1d083a19af58f4b1206d299ed73689249fd8 ] BPF CI has been reporting the tc_redirect_dtime test failing from time to time: test_inet_dtime:PASS:setns src 0 nsec (network_helpers.c:253: errno: No route to host) Failed to connect to server close_netns:PASS:setns 0 nsec test_inet_dtime:FAIL:connect_to_fd unexpected connect_to_fd: actual -1 < expected 0 test_tcp_clear_dtime:PASS:tcp ip6 clear dtime ingress_fwdns_p100 0 nsec The connect_to_fd failure (EHOSTUNREACH) is from the test_tcp_clear_dtime() test and it is the very first IPv6 traffic after setting up all the links, addresses, and routes. The symptom is this first connect() is always slow. In my setup, it could take ~3s. After some tracing and tcpdump, the slowness is mostly spent in the neighbor solicitation in the "ns_fwd" namespace while the "ns_src" and "ns_dst" are fine. I forced the kernel to drop the neighbor solicitation messages. I can then reproduce EHOSTUNREACH. What actually happen could be: - the neighbor advertisement came back a little slow. - the "ns_fwd" namespace concluded a neighbor discovery failure and triggered the ndisc_error_report() => ip6_link_failure() => icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0) - the client's connect() reports EHOSTUNREACH after receiving the ICMPV6_DEST_UNREACH message. The neigh table of both "ns_src" and "ns_dst" namespace has already been manually populated but not the "ns_fwd" namespace. This patch fixes it by manually populating the neigh table also in the "ns_fwd" namespace. Although the namespace configuration part had been existed before the tc_redirect_dtime test, still Fixes-tagging the patch when the tc_redirect_dtime test was added since it is the only test hitting it so far. Fixes: c803475fd8dd ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240120060518.3604920-1-martin.lau@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftest/bpf: Add map_in_maps with BPF_MAP_TYPE_PERF_EVENT_ARRAY valuesAndrey Grafin2024-03-262-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 40628f9fff73adecac77a9aa390f8016724cad99 ] Check that bpf_object__load() successfully creates map_in_maps with BPF_MAP_TYPE_PERF_EVENT_ARRAY values. These changes cover fix in the previous patch "libbpf: Apply map_set_def_max_entries() for inner_maps on creation". A command line output is: - w/o fix $ sudo ./test_maps libbpf: map 'mim_array_pe': failed to create inner map: -22 libbpf: map 'mim_array_pe': failed to create: Invalid argument(-22) libbpf: failed to load object './test_map_in_map.bpf.o' Failed to load test prog - with fix $ sudo ./test_maps ... test_maps: OK, 0 SKIPPED Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support") Signed-off-by: Andrey Grafin <conquistador@yandex-team.ru> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20240117130619.9403-2-conquistador@yandex-team.ru Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* libbpf: Apply map_set_def_max_entries() for inner_maps on creationAndrey Grafin2024-03-261-0/+4
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit f04deb90e516e8e48bf8693397529bc942a9e80b ] This patch allows to auto create BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS with values of BPF_MAP_TYPE_PERF_EVENT_ARRAY by bpf_object__load(). Previous behaviour created a zero filled btf_map_def for inner maps and tried to use it for a map creation but the linux kernel forbids to create a BPF_MAP_TYPE_PERF_EVENT_ARRAY map with max_entries=0. Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support") Signed-off-by: Andrey Grafin <conquistador@yandex-team.ru> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20240117130619.9403-1-conquistador@yandex-team.ru Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Fix potential premature unload in bpf_testmodArtem Savkov2024-03-261-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d177c1be06ce28aa8c8710ac55be1b5ad3f314c6 ] It is possible for bpf_kfunc_call_test_release() to be called from bpf_map_free_deferred() when bpf_testmod is already unloaded and perf_test_stuct.cnt which it tries to decrease is no longer in memory. This patch tries to fix the issue by waiting for all references to be dropped in bpf_testmod_exit(). The issue can be triggered by running 'test_progs -t map_kptr' in 6.5, but is obscured in 6.6 by d119357d07435 ("rcu-tasks: Treat only synchronous grace periods urgently"). Fixes: 65eb006d85a2 ("bpf: Move kernel test kfuncs to bpf_testmod") Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/82f55c0e-0ec8-4fe1-8d8c-b1de07558ad9@linux.dev Link: https://lore.kernel.org/bpf/20240110085737.8895-1-asavkov@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* bpftool: Silence build warning about calloc()Tiezhu Yang2024-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f5f30386c78105cba520e443a6a9ee945ec1d066 ] There exists the following warning when building bpftool: CC prog.o prog.c: In function ‘profile_open_perf_events’: prog.c:2301:24: warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Wcalloc-transposed-args] 2301 | sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric); | ^~~ prog.c:2301:24: note: earlier argument should specify number of elements, later size of each element Tested with the latest upstream GCC which contains a new warning option -Wcalloc-transposed-args. The first argument to calloc is documented to be number of elements in array, while the second argument is size of each element, just switch the first and second arguments of calloc() to silence the build warning, compile tested only. Fixes: 47c09d6a9f67 ("bpftool: Introduce "prog profile" command") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240116061920.31172-1-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: tls: use exact comparison in recv_partialJakub Kicinski2024-03-261-4/+4
| | | | | | | | | | | | [ Upstream commit 49d821064c44cb5ffdf272905236012ea9ce50e3 ] This exact case was fail for async crypto and we weren't catching it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: openvswitch: Add validation for the recursion testAaron Conole2024-03-262-15/+69
| | | | | | | | | | | | | | | | [ Upstream commit bd128f62c365504e1268dc09fcccdfb1f091e93a ] Add a test case into the netlink checks that will show the number of nested action recursions won't exceed 16. Going to 17 on a small clone call isn't enough to exhaust the stack on (most) systems, so it should be safe to run even on systems that don't have the fix applied. Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240207132416.1488485-3-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: mptcp: decrease BW in simult flowsMatthieu Baerts (NGI0)2024-03-151-4/+4
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5e2f3c65af47e527ccac54060cf909e3306652ff ] When running the simult_flow selftest in slow environments -- e.g. QEmu without KVM support --, the results can be unstable. This selftest checks if the aggregated bandwidth is (almost) fully used as expected. To help improving the stability while still keeping the same validation in place, the BW and the delay are reduced to lower the pressure on the CPU. Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests") Fixes: 219d04992b68 ("mptcp: push pending frames when subflow has free space") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-6-4c1c11e571ff@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Fix up xdp bonding test wrt feature flagsDaniel Borkmann2024-03-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0bfc0336e1348883fdab4689f0c8c56458f36dd8 ] Adjust the XDP feature flags for the bond device when no bond slave devices are attached. After 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY"), the empty bond device must report 0 as flags instead of NETDEV_XDP_ACT_MASK. # ./vmtest.sh -- ./test_progs -t xdp_bond [...] [ 3.983311] bond1 (unregistering): (slave veth1_1): Releasing backup interface [ 3.995434] bond1 (unregistering): Released all slaves [ 4.022311] bond2: (slave veth2_1): Releasing backup interface #507/1 xdp_bonding/xdp_bonding_attach:OK #507/2 xdp_bonding/xdp_bonding_nested:OK #507/3 xdp_bonding/xdp_bonding_features:OK #507/4 xdp_bonding/xdp_bonding_roundrobin:OK #507/5 xdp_bonding/xdp_bonding_activebackup:OK #507/6 xdp_bonding/xdp_bonding_xor_layer2:OK #507/7 xdp_bonding/xdp_bonding_xor_layer23:OK #507/8 xdp_bonding/xdp_bonding_xor_layer34:OK #507/9 xdp_bonding/xdp_bonding_redirect_multi:OK #507 xdp_bonding:OK Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED [ 4.185255] bond2 (unregistering): Released all slaves [...] Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Message-ID: <20240305090829.17131-2-daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: mptcp: rm subflow with v4/v4mapped addrGeliang Tang2024-03-062-14/+18
| | | | | | | | | | | | | | | | | | | | | commit 7092dbee23282b6fcf1313fc64e2b92649ee16e8 upstream. Now both a v4 address and a v4-mapped address are supported when destroying a userspace pm subflow, this patch adds a second subflow to "userspace pm add & remove address" test, and two subflows could be removed two different ways, one with the v4mapped and one with v4. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 48d73f609dcc ("selftests: mptcp: update userspace pm addr tests") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: add mptcp_lib_is_v6Geliang Tang2024-03-064-28/+15
| | | | | | | | | | | | | | | | | | | commit b850f2c7dd85ecd14a333685c4ffd23f12665e94 upstream. To avoid duplicated code in different MPTCP selftests, we can add and use helpers defined in mptcp_lib.sh. is_v6() helper is defined in mptcp_connect.sh, mptcp_join.sh and mptcp_sockopt.sh, so export it into mptcp_lib.sh and rename it as mptcp_lib_is_v6(). Use this new helper in all scripts. Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-10-8d6b94150f6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: update userspace pm test helpersGeliang Tang2024-03-061-52/+50
| | | | | | | | | | | | | | | | | | | | commit 757c828ce94905a2975873d5e90a376c701b2b90 upstream. This patch adds a new argument namespace to userspace_pm_add_addr() and userspace_pm_add_sf() to make these two helper more versatile. Add two more versatile helpers for userspace pm remove subflow or address: userspace_pm_rm_addr() and userspace_pm_rm_sf(). The original test helpers userspace_pm_rm_sf_addr_ns1() and userspace_pm_rm_sf_addr_ns2() can be replaced by these new helpers. Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-4-8d6b94150f6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: add chk_subflows_total helperGeliang Tang2024-03-061-1/+41
| | | | | | | | | | | | | | | | | | | | | | | commit 80775412882e273b8ef62124fae861cde8e6fb3d upstream. This patch adds a new helper chk_subflows_total(), in it use the newly added counter mptcpi_subflows_total to get the "correct" amount of subflows, including the initial one. To be compatible with old 'ss' or kernel versions not supporting this counter, get the total subflows by listing TCP connections that are MPTCP subflows: ss -ti state state established state syn-sent state syn-recv | grep -c tcp-ulp-mptcp. Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-3-8d6b94150f6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: add evts_get_info helperGeliang Tang2024-03-063-58/+57
| | | | | | | | | | | | | | | | | | | | commit 06848c0f341ee3f9226ed01e519c72e4d2b6f001 upstream. This patch adds a new helper get_info_value(), using 'sed' command to parse the value of the given item name in the line with the given keyword, to make chk_mptcp_info() and pedit_action_pkts() more readable. Also add another helper evts_get_info() to use get_info_value() to parse the output of 'pm_nl_ctl events' command, to make all the userspace pm selftests more readable, both in mptcp_join.sh and userspace_pm.sh. Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-2-8d6b94150f6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: join: add ss mptcp support checkGeliang Tang2024-03-061-0/+5
| | | | | | | | | | | | | | | | commit 9480f388a2ef54fba911d9325372abd69a328601 upstream. Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP sockets. So it must be checked if ss tool supports MPTCP in this script. Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tools: ynl: fix handling of multiple mcast groupsJakub Kicinski2024-03-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b6c65eb20ffa8e3bd89f551427dbeee2876d72ca ] We never increment the group number iterator, so all groups get recorded into index 0 of the mcast_groups[] array. As a result YNL can only handle using the last group. For example using the "netdev" sample on kernel with page pool commands results in: $ ./samples/netdev YNL: Multicast group 'mgmt' not found Most families have only one multicast group, so this hasn't been noticed. Plus perhaps developers usually test the last group which would have worked. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: mptcp: add mptcp_lib_get_counterGeliang Tang2024-03-014-86/+73
| | | | | | | | | | | | | | | | | | | | | | | commit 61c131f5d4d2b79904af2fdcb2839a9db8e7c55c upstream. To avoid duplicated code in different MPTCP selftests, we can add and use helpers defined in mptcp_lib.sh. The helper get_counter() in mptcp_join.sh and get_mib_counter() in mptcp_connect.sh have the same functionality, export get_counter() into mptcp_lib.sh and rename it as mptcp_lib_get_counter(). Use this new helper instead of get_counter() and get_mib_counter(). Use this helper in test_prio() in userspace_pm.sh too instead of open-coding. Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-11-8d6b94150f6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: join: stop transfer when check is done (part 2)Matthieu Baerts (NGI0)2024-03-011-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 04b57c9e096a9479fe0ad31e3956e336fa589cb2 upstream. Since the "Fixes" commits mentioned below, the newly added "userspace pm" subtests of mptcp_join selftests are launching the whole transfer in the background, do the required checks, then wait for the end of transfer. There is no need to wait longer, especially because the checks at the end of the transfer are ignored (which is fine). This saves quite a few seconds on slow environments. While at it, use 'mptcp_lib_kill_wait()' helper everywhere, instead of on a specific one with 'kill_tests_wait()'. Fixes: b2e2248f365a ("selftests: mptcp: userspace pm create id 0 subflow") Fixes: e3b47e460b4b ("selftests: mptcp: userspace pm remove initial subflow") Fixes: b9fb176081fb ("selftests: mptcp: userspace pm send RM_ADDR for ID 0") Cc: stable@vger.kernel.org Reviewed-and-tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-9-4c1c11e571ff@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: join: stop transfer when check is done (part 1)Matthieu Baerts (NGI0)2024-03-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 31ee4ad86afd6ed6f4bb1b38c43011216080c42a upstream. Since the "Fixes" commit mentioned below, "userspace pm" subtests of mptcp_join selftests introduced in v6.5 are launching the whole transfer in the background, do the required checks, then wait for the end of transfer. There is no need to wait longer, especially because the checks at the end of the transfer are ignored (which is fine). This saves quite a few seconds in slow environments. Note that old versions will need commit bdbef0a6ff10 ("selftests: mptcp: add mptcp_lib_kill_wait") as well to get 'mptcp_lib_kill_wait()' helper. Fixes: 4369c198e599 ("selftests: mptcp: test userspace pm out of transfer") Cc: stable@vger.kernel.org # 6.5.x: bdbef0a6ff10: selftests: mptcp: add mptcp_lib_kill_wait Cc: stable@vger.kernel.org # 6.5.x Reviewed-and-tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-8-4c1c11e571ff@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests/iommu: fix the config fragmentMuhammad Usama Anjum2024-03-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 510325e5ac5f45c1180189d3bfc108c54bf64544 ] The config fragment doesn't follow the correct format to enable those config options which make the config options getting missed while merging with other configs. ➜ merge_config.sh -m .config tools/testing/selftests/iommu/config Using .config as base Merging tools/testing/selftests/iommu/config ➜ make olddefconfig .config:5295:warning: unexpected data: CONFIG_IOMMUFD .config:5296:warning: unexpected data: CONFIG_IOMMUFD_TEST While at it, add CONFIG_FAULT_INJECTION as well which is needed for CONFIG_IOMMUFD_TEST. If CONFIG_FAULT_INJECTION isn't present in base config (such as x86 defconfig), CONFIG_IOMMUFD_TEST doesn't get enabled. Fixes: 57f0988706fe ("iommufd: Add a selftest") Link: https://lore.kernel.org/r/20240222074934.71380-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tools: ynl: don't leak mcast_groups on init errorJakub Kicinski2024-03-011-1/+7
| | | | | | | | | | | | | | | [ Upstream commit 5d78b73e851455d525a064f3b042b29fdc0c1a4a ] Make sure to free the already-parsed mcast_groups if we don't get an ack from the kernel when reading family info. This is part of the ynl_sock_create() error path, so we won't get a call to ynl_sock_destroy() to free them later. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240220161112.2735195-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tools: ynl: make sure we always pass yarg to mnl_cb_runJakub Kicinski2024-03-011-3/+8
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit e4fe082c38cd74a8fa384bc7542cf3edf1cb7318 ] There is one common error handler in ynl - ynl_cb_error(). It expects priv to be a pointer to struct ynl_parse_arg AKA yarg. To avoid potential crashes if we encounter a stray NLMSG_ERROR always pass yarg as priv (or a struct which has it as the first member). ynl_cb_null() has a similar problem directly - it expects yarg but priv passed by the caller is ys. Found by code inspection. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240220161112.2735195-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: bonding: set active slave to primary eth1 specificallyHangbin Liu2024-03-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cd65c48d66920457129584553f217005d09b1edb ] In bond priority testing, we set the primary interface to eth1 and add eth0,1,2 to bond in serial. This is OK in normal times. But when in debug kernel, the bridge port that eth0,1,2 connected would start slowly (enter blocking, forwarding state), which caused the primary interface down for a while after enslaving and active slave changed. Here is a test log from Jakub's debug test[1]. [ 400.399070][ T50] br0: port 1(s0) entered disabled state [ 400.400168][ T50] br0: port 4(s2) entered disabled state [ 400.941504][ T2791] bond0: (slave eth0): making interface the new active one [ 400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link [ 400.943633][ T2766] br0: port 1(s0) entered blocking state [ 400.944119][ T2766] br0: port 1(s0) entered forwarding state [ 401.128792][ T2792] bond0: (slave eth1): making interface the new active one [ 401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link [ 401.131643][ T69] br0: port 2(s1) entered blocking state [ 401.132067][ T69] br0: port 2(s1) entered forwarding state [ 401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link [ 401.348414][ T50] br0: port 4(s2) entered blocking state [ 401.348857][ T50] br0: port 4(s2) entered forwarding state [ 401.519669][ T250] bond0: (slave eth0): link status definitely down, disabling slave [ 401.526522][ T250] bond0: (slave eth1): link status definitely down, disabling slave [ 401.526986][ T250] bond0: (slave eth2): making interface the new active one [ 401.629470][ T250] bond0: (slave eth0): link status definitely up [ 401.630089][ T250] bond0: (slave eth1): link status definitely up [...] # TEST: prio (active-backup ns_ip6_target primary_reselect 1) [FAIL] # Current active slave is eth2 but not eth1 Fix it by setting active slave to primary slave specifically before testing. [1] https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/464301/1-bond-options-sh/stdout Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net/sched: act_mirred: use the backlog for mirred ingressJakub Kicinski2024-03-011-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 52f671db18823089a02f07efc04efdb2272ddc17 ] The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog for nested calls to mirred ingress") hangs our testing VMs every 10 or so runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by lockdep. The problem as previously described by Davide (see Link) is that if we reverse flow of traffic with the redirect (egress -> ingress) we may reach the same socket which generated the packet. And we may still be holding its socket lock. The common solution to such deadlocks is to put the packet in the Rx backlog, rather than run the Rx path inline. Do that for all egress -> ingress reversals, not just once we started to nest mirred calls. In the past there was a concern that the backlog indirection will lead to loss of error reporting / less accurate stats. But the current workaround does not seem to address the issue. Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions") Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Suggested-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: mptcp: diag: unique 'cestab' subtest namesMatthieu Baerts (NGI0)2024-03-011-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4103d8480866fe5abb71ef0ed8af3a3b7b9625bf upstream. It is important to have a unique (sub)test name in TAP, because some CI environments drop tests with duplicated name. Some 'cestab' subtests from the diag selftest had the same names, e.g.: ....chk 0 cestab Now the previous value is taken, to have different names, e.g.: ....chk 2->0 cestab after flush While at it, the 'after flush' info is added, similar to what is done with the 'in use' subtests. Also inspired by these 'in use' subtests, 'many' is displayed instead of a large number: many msk socket present [ ok ] ....chk many msk in use [ ok ] ....chk many cestab [ ok ] ....chk many->0 msk in use after flush [ ok ] ....chk many->0 cestab after flush [ ok ] Fixes: 81ab772819da ("selftests: mptcp: diag: check CURRESTAB counters") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>