summaryrefslogtreecommitdiffstats
path: root/tools
Commit message (Collapse)AuthorAgeFilesLines
* mptcp: don't account accept() of non-MPC client as fallback to TCPDavide Caratti2024-04-101-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 upstream. Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they accept non-MPC connections. As reported by Christoph, this is "surprising" because the counter might become greater than MPTcpExtMPCapableSYNRX. MPTcpExtMPCapableFallbackACK counter's name suggests it should only be incremented when a connection was seen using MPTCP options, then a fallback to TCP has been done. Let's do that by incrementing it when the subflow context of an inbound MPC connection attempt is dropped. Also, update mptcp_connect.sh kselftest, to ensure that the above MIB does not increment in case a pure TCP client connects to a MPTCP server. Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: connect: fix shellcheck warningsMatthieu Baerts (NGI0)2024-04-101-29/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e3aae1098f109f0bd33c971deff1926f4e4441d0 upstream. shellcheck recently helped to prevent issues. It is then good to fix the other harmless issues in order to spot "real" ones later. Here, two categories of warnings are now ignored: - SC2317: Command appears to be unreachable. The cleanup() function is invoked indirectly via the EXIT trap. - SC2086: Double quote to prevent globbing and word splitting. This is recommended, but the current usage is correct and there is no need to do all these modifications to be compliant with this rule. For the modifications: - SC2034: ksft_skip appears unused. - SC2181: Check exit code directly with e.g. 'if mycmd;', not indirectly with $?. - SC2004: $/${} is unnecessary on arithmetic variables. - SC2155: Declare and assign separately to avoid masking return values. - SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined. - SC2059: Don't use variables in the printf format string. Use printf '..%s..' "$foo". Now this script is shellcheck (0.9.0) compliant. We can easily spot new issues. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-8-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests/mm: include strings.h for ffslEdward Liaw2024-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | commit 176517c9310281d00dd3210ab4cc4d3cdc26b17e upstream. Got a compilation error on Android for ffsl after 91b80cc5b39f ("selftests: mm: fix map_hugetlb failure on 64K page size systems") included vm_util.h. Link: https://lkml.kernel.org/r/20240329185814.16304-1-edliaw@google.com Fixes: af605d26a8f2 ("selftests/mm: merge util.h into vm_util.h") Signed-off-by: Edward Liaw <edliaw@google.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: reuseaddr_conflict: add missing new line at the end of the outputJakub Kicinski2024-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | commit 31974122cfdeaf56abc18d8ab740d580d9833e90 upstream. The netdev CI runs in a VM and captures serial, so stdout and stderr get combined. Because there's a missing new line in stderr the test ends up corrupting KTAP: # Successok 1 selftests: net: reuseaddr_conflict which should have been: # Success ok 1 selftests: net: reuseaddr_conflict Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test") Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: net: gro fwd: update vxlan GRO test expectationsAntoine Tenart2024-04-101-8/+2
| | | | | | | | | | | | | | | | | commit 0fb101be97ca27850c5ecdbd1269423ce4d1f607 upstream. UDP tunnel packets can't be GRO in-between their endpoints as this causes different issues. The UDP GRO fwd vxlan tests were relying on this and their expectations have to be fixed. We keep both vxlan tests and expected no GRO from happening. The vxlan UDP GRO bench test was removed as it's not providing any valuable information now. Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests: mptcp: join: fix dev in check_endpointGeliang Tang2024-04-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | commit 40061817d95bce6dd5634a61a65cd5922e6ccc92 upstream. There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly. If calling it in the 2nd test of endpoint_tests() too, it fails with an error like this: creation [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \ found '10.0.2.2 id 2 subflow dev ns2eth2' The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it. Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-2-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/CPU/AMD: Add X86_FEATURE_ZEN1Borislav Petkov (AMD)2024-04-101-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit 232afb557835d6f6859c73bf610bad308c96b131 ] Add a synthetic feature flag specifically for first generation Zen machines. There's need to have a generic flag for all Zen generations so make X86_FEATURE_ZEN be that flag. Fixes: 30fa92832f40 ("x86/CPU/AMD: Add ZenX generations flags") Suggested-by: Brian Gerst <brgerst@gmail.com> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/dc3835e3-0731-4230-bbb9-336bbe3d042b@amd.com Stable-dep-of: c7b2edd8377b ("perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later") Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: vxlan_mdb: Fix failures with old libnetIdo Schimmel2024-04-101-77/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f1425529c33def8b46faae4400dd9e2bbaf16a05 ] Locally generated IP multicast packets (such as the ones used in the test) do not perform routing and simply egress the bound device. However, as explained in commit 8bcfb4ae4d97 ("selftests: forwarding: Fix failing tests with old libnet"), old versions of libnet (used by mausezahn) do not use the "SO_BINDTODEVICE" socket option. Specifically, the library started using the option for IPv6 sockets in version 1.1.6 and for IPv4 sockets in version 1.2. This explains why on Ubuntu - which uses version 1.1.6 - the IPv4 overlay tests are failing whereas the IPv6 ones are passing. Fix by specifying the source and destination MAC of the packets which will cause mausezahn to use a packet socket instead of an IP socket. Fixes: 62199e3f1658 ("selftests: net: Add VXLAN MDB test") Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Closes: https://lore.kernel.org/netdev/5bb50349-196d-4892-8ed2-f37543aa863f@alu.unizg.hr/ Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240325075030.2379513-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tools: ynl: fix setting presence bits in simple nestsJakub Kicinski2024-04-101-2/+5
| | | | | | | | | | | | | | | [ Upstream commit f6c8f5e8694c7a78c94e408b628afa6255cc428a ] When we set members of simple nested structures in requests we need to set "presence" bits for all the nesting layers below. This has nothing to do with the presence type of the last layer. Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink") Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240321020214.1250202-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tools/resolve_btfids: fix build with musl libcNatanael Copa2024-04-031-0/+2
| | | | | | | | | | | | | | | | | commit 62248b22d01e96a4d669cde0d7005bd51ebf9e76 upstream. Include the header that defines u32. This fixes build of 6.6.23 and 6.1.83 kernels for Alpine Linux, which uses musl libc. I assume that GNU libc indirecly pulls in linux/types.h. Fixes: 9707ac4fe2f5 ("tools/resolve_btfids: Refactor set sorting with types from btf_ids.h") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218647 Cc: stable@vger.kernel.org Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Tested-by: Greg Thelen <gthelen@google.com> Link: https://lore.kernel.org/r/20240328110103.28734-1-ncopa@alpinelinux.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests/mm: fix ARM related issue with fork after pthread_createEdward Liaw2024-04-033-0/+15
| | | | | | | | | | | | | | | | | | | | | | commit 8c864371b2a15a23ce35aa7e2bd241baaad6fbe8 upstream. Following issue was observed while running the uffd-unit-tests selftest on ARM devices. On x86_64 no issues were detected: pthread_create followed by fork caused deadlock in certain cases wherein fork required some work to be completed by the created thread. Used synchronization to ensure that created thread's start function has started before invoking fork. [edliaw@google.com: refactored to use atomic_bool] Link: https://lkml.kernel.org/r/20240325194100.775052-1-edliaw@google.com Fixes: 760aee0b71e3 ("selftests/mm: add tests for RO pinning vs fork()") Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Signed-off-by: Edward Liaw <edliaw@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEMEdward Liaw2024-04-031-1/+2
| | | | | | | | | | | | | | | | | commit 105840ebd76d8dbc1a7d734748ae320076f3201e upstream. The sigbus-wp test requires the UFFD_FEATURE_WP_HUGETLBFS_SHMEM flag for shmem and hugetlb targets. Otherwise it is not backwards compatible with kernels <5.19 and fails with EINVAL. Link: https://lkml.kernel.org/r/20240321232023.2064975-1-edliaw@google.com Fixes: 73c1ea939b65 ("selftests/mm: move uffd sig/events tests into uffd unit tests") Signed-off-by: Edward Liaw <edliaw@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Peter Xu <peterx@redhat.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* perf top: Use evsel's cpus to replace user_requested_cpusKan Liang2024-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5fa695e7da4975e8d21ce49f3718d6cf00ecb75e upstream. perf top errors out on a hybrid machine $perf top Error: The cycles:P event is not supported. The perf top expects that the "cycles" is collected on all CPUs in the system. But for hybrid there is no single "cycles" event which can cover all CPUs. Perf has to split it into two cycles events, e.g., cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask. If a event is opened on the unsupported CPU. The open fails. That's the reason of the above error out. Perf should only open the cycles event on the corresponding CPU. The commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting core PMU maps") intersect the requested CPU map with the CPU map of the PMU. Use the evsel's cpus to replace user_requested_cpus. The evlist's threads are also propagated to the evsel's threads in __perf_evlist__propagate_maps(). For a system-wide event, perf appends a dummy event and assign it to the evsel's threads. For a per-thread event, the evlist's thread_map is assigned to the evsel's threads. The same as the other tools, e.g., perf record, using the evsel's threads when opening an event. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hector Martin <marcan@marcan.st> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/ Link: https://lore.kernel.org/r/20231214144612.1092028-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selftests/mm: Fix build with _FORTIFY_SOURCEVitaly Chikunov2024-04-033-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8b65ef5ad4862904e476a8f3d4e4418c950ddb90 ] Add missing flags argument to open(2) call with O_CREAT. Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid value) (together with -O), resulting in similar error messages such as: In file included from /usr/include/fcntl.h:342, from gup_test.c:1: In function 'open', inlined from 'main' at gup_test.c:206:10: /usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments 50 | __open_missing_mode (); | ^~~~~~~~~~~~~~~~~~~~~~ _FORTIFY_SOURCE is enabled by default in some distributions, so the tests are not built by default and are skipped. open(2) man-page warns about missing flags argument: "if it is not supplied, some arbitrary bytes from the stack will be applied as the file mode." Link: https://lkml.kernel.org/r/20240318023445.3192922-1-vt@altlinux.org Fixes: aeb85ed4f41a ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file") Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split") Fixes: c942f5bd17b3 ("selftests: soft-dirty: add test for mprotect") Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/mm: gup_test: conform test to TAP format outputMuhammad Usama Anjum2024-04-031-32/+33
| | | | | | | | | | | | | | [ Upstream commit cb6e7cae18868422a23d62670110c61fd1b15029 ] Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240102053807.2114200-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 8b65ef5ad486 ("selftests/mm: Fix build with _FORTIFY_SOURCE") Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: mptcp: diag: return KSFT_FAIL not test_cntGeliang Tang2024-04-031-3/+3
| | | | | | | | | | | | | | | | | | | commit 45bcc0346561daa3f59e19a753cc7f3e08e8dff1 upstream. The test counter 'test_cnt' should not be returned in diag.sh, e.g. what if only the 4th test fail? Will do 'exit 4' which is 'exit ${KSFT_SKIP}', the whole test will be marked as skipped instead of 'failed'! So we should do ret=${KSFT_FAIL} instead. Fixes: df62f2ec3df6 ("selftests/mptcp: add diag interface tests") Cc: stable@vger.kernel.org Fixes: 42fb6cddec3b ("selftests: mptcp: more stable diag tests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* wireguard: selftests: set RISCV_ISA_FALLBACK on riscv{32,64}Jason A. Donenfeld2024-04-032-0/+2
| | | | | | | | | | | | | | | [ Upstream commit e995f5dd9a9cef818af32ec60fc38d68614afd12 ] This option is needed to continue booting with QEMU. Recent changes that made this optional meant that it gets unset in the test harness, and so WireGuard CI has been broken. Fix this by simply setting this option. Cc: stable@vger.kernel.org Fixes: 496ea826d1e1 ("RISC-V: provide Kconfig & commandline options to control parsing "riscv,isa"") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/mqueue: Set timeout to 180 secondsSeongJae Park2024-04-031-0/+1
| | | | | | | | | | | | | | | | [ Upstream commit 85506aca2eb4ea41223c91c5fe25125953c19b13 ] While mq_perf_tests runs with the default kselftest timeout limit, which is 45 seconds, the test takes about 60 seconds to complete on i3.metal AWS instances. Hence, the test always times out. Increase the timeout to 180 seconds. Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: forwarding: Fix ping failure due to short timeoutIdo Schimmel2024-03-262-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e4137851d4863a9bdc6aabc613bcb46c06d91e64 ] The tests send 100 pings in 0.1 second intervals and force a timeout of 11 seconds, which is borderline (especially on debug kernels), resulting in random failures in netdev CI [1]. Fix by increasing the timeout to 20 seconds. It should not prolong the test unless something is wrong, in which case the test will rightfully fail. [1] # selftests: net/forwarding: vxlan_bridge_1d_port_8472_ipv6.sh # INFO: Running tests with UDP port 8472 # TEST: ping: local->local [ OK ] # TEST: ping: local->remote 1 [FAIL] # Ping failed [...] Fixes: b07e9957f220 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6") Fixes: 728b35259e28 ("selftests: forwarding: Add VxLAN tests with a VLAN-aware bridge for IPv6") Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/netdev/24a7051fdcd1f156c3704bca39e4b3c41dfc7c4b.camel@redhat.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240320065717.4145325-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf pmu: Fix a potential memory leak in perf_pmu__lookup()Christophe JAILLET2024-03-261-4/+3
| | | | | | | | | | | | | | | | | [ Upstream commit ef5de1613d7d92bdc975e6beb34bb0fa94f34078 ] The commit in Fixes has reordered some code, but missed an error handling path. 'goto err' now, in order to avoid a memory leak in case of error. Fixes: f63a536f03a2 ("perf pmu: Merge JSON events with sysfs at load time") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ian Rogers <irogers@google.com> Cc: kernel-janitors@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/9538b2b634894c33168dfe9d848d4df31fd4d801.1693085544.git.christophe.jaillet@wanadoo.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf print-events: make is_event_supported() more robustMark Rutland2024-03-261-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 25412c0364f7110faa6053c73e3fd47ca956b8c3 ] Currently the perf tool doesn't detect support for extended event types on Apple M1/M2 systems, and will not auto-expand plain PERF_EVENT_TYPE hardware events into per-PMU events. This is due to the detection of extended event types not handling mandatory filters required by the M1/M2 PMU driver. PMU drivers and the core perf_events code can require that perf_event_attr::exclude_* filters are configured in a specific way and may reject certain configurations of filters, for example: (a) Many PMUs lack support for any event filtering, and require all perf_event_attr::exclude_* bits to be clear. This includes Alpha's CPU PMU, and ARM CPU PMUs prior to the introduction of PMUv2 in ARMv7, (b) When /proc/sys/kernel/perf_event_paranoid >= 2, the perf core requires that perf_event_attr::exclude_kernel is set. (c) The Apple M1/M2 PMU requires that perf_event_attr::exclude_guest is set as the hardware PMU does not count while a guest is running (but might be extended in future to do so). In is_event_supported(), we try to account for cases (a) and (b), first attempting to open an event without any filters, and if this fails, retrying with perf_event_attr::exclude_kernel set. We do not account for case (c), or any other filters that drivers could theoretically require to be set. Thus is_event_supported() will fail to detect support for any events targeting an Apple M1/M2 PMU, even where events would be supported with perf_event_attr:::exclude_guest set. Since commit: 82fe2e45cdb00de4 ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type") ... we use is_event_supported() to detect support for extended types, with the PMU ID encoded into the perf_event_attr::type. As above, on an Apple M1/M2 system this will always fail to detect that the event is supported, and consequently we fail to detect support for extended types even when these are supported, as they have been since commit: 5c816728651ae425 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability") Due to this, the perf tool will not automatically expand plain PERF_TYPE_HARDWARE events into per-PMU events, even when all the necessary kernel support is present. This patch updates is_event_supported() to additionally try opening events with perf_event_attr::exclude_guest set, allowing support for events to be detected on Apple M1/M2 systems. I believe that this is sufficient for all contemporary CPU PMU drivers, though in future it may be necessary to check for other combinations of filter bits. I've deliberately changed the check to not expect a specific error code for missing filters, as today ;the kernel may return a number of different error codes for missing filters (e.g. -EACCESS, -EINVAL, or -EOPNOTSUPP) depending on why and where the filter configuration is rejected, and retrying for any error is more robust. Note that this does not remove the need for commit: a24d9d9dc096fc0d ("perf parse-events: Make legacy events lower priority than sysfs/JSON") ... which is still necessary so that named-pmu/event/ events work on kernels without extended type support, even if the event name happens to be the same as a PERF_EVENT_TYPE_HARDWARE event (e.g. as is the case for the M1/M2 PMU's 'cycles' and 'instructions' events). Fixes: 82fe2e45cdb00de4 ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@arm.com> Tested-by: Marc Zyngier <maz@kernel.org> Cc: Hector Martin <marcan@marcan.st> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mike Leach <mike.leach@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240126145605.1005472-1-mark.rutland@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf metric: Don't remove scale from countsIan Rogers2024-03-261-6/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6d6be5eb45b423a37d746d3ee0fd0c78f76ead9f ] Counts were switched from the scaled saved value form to the aggregated count to avoid double accounting. When this happened the removing of scaling for a count should have been removed, however, it wasn't and this wasn't observed as it normally doesn't matter because a counter's scale is 1. A problem was observed with RAPL events that are scaled. Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather than saved_value") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-5-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf stat: Avoid metric-only segvIan Rogers2024-03-261-1/+1
| | | | | | | | | | | | | | | | | | | [ Upstream commit 2543947c77e0e224bda86b4e7220c2f6714da463 ] Cycles is recognized as part of a hard coded metric in stat-shadow.c, it may call print_metric_only with a NULL fmt string leading to a segfault. Handle the NULL fmt explicitly. Fixes: 088519f318be ("perf stat: Move the display functions to stat-display.c") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-4-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf expr: Fix "has_event" function for metric style eventsIan Rogers2024-03-261-1/+19
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 6dd76680b925228312756c13b9b983661b552a64 ] Events in metrics cannot use '/' as a separator, it would be recognized as a divide, so they use '@'. The '@' is recognized in the metricgroups code and changed to '/', do the same in the has_event function so that the parsing is only tried without the @s. Fixes: 4a4a9bf9075f ("perf expr: Add has_event function") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Kaige Ye <ye@kaige.org> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240209204947.3873294-3-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf srcline: Add missed addr2line closesIan Rogers2024-03-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c7ba9d18ae47924a6ea6a47ca139779f58eb83c0 ] The child_process for addr2line sets in and out to -1 so that pipes get created. It is the caller's responsibility to close the pipes, finish_command doesn't do it. Add the missed closes. Fixes: b3801e791231 ("perf srcline: Simplify addr2line subprocess") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240201001504.1348511-8-irogers@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()Yang Jihong2024-03-261-1/+1
| | | | | | | | | | | | | | [ Upstream commit 1eb3d924e3c0b8c27388b0583a989d757866efb6 ] slist needs to be freed in both error path and normal path in thread_map__new_by_tid_str(). Fixes: b52956c961be3a04 ("perf tools: Allow multiple threads or processes in record, stat, top") Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-6-yangjihong1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf bpf: Clean up the generated/copied vmlinux.hArnaldo Carvalho de Melo2024-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ffd856537b95dd65facb4e0c78ca1cb92c2048ff ] When building perf with BPF skels we either copy the minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h or use bpftool to generate a vmlinux from BTF, storing the result in $(SKEL_OUT)/vmlinux.h. We need to remove that when doing a 'make -C tools/perf clean', fix it. Fixes: b7a2d774c9c5a9a3 ("perf build: Add ability to build with a generated vmlinux.h") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/Zbz89KK5wHfZ82jv@x1 Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf evsel: Fix duplicate initialization of data->id in evsel__parse_sample()Yang Jihong2024-03-261-1/+0
| | | | | | | | | | | | | | [ Upstream commit 4962aec0d684c8edb14574ccd0da53e4926ff834 ] data->id has been initialized at line 2362, remove duplicate initialization. Fixes: 3ad31d8a0df2 ("perf evsel: Centralize perf_sample initialization") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240127025756.4041808-1-yangjihong1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf pmu: Treat the msr pmu as softwareIan Rogers2024-03-261-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 24852ef2e2d5c555c2da05baff112ea414b6e0f5 ] The msr PMU is a software one, meaning msr events may be grouped with events in a hardware context. As the msr PMU isn't marked as a software PMU by perf_pmu__is_software, groups with the msr PMU in are broken and the msr events placed in a different group. This may lead to multiplexing errors where a hardware event isn't counted while the msr event, such as tsc, is. Fix all of this by marking the msr PMU as software, which agrees with the driver. Before: ``` $ perf stat -e '{slots,tsc}' -a true WARNING: events were regrouped to match PMUs Performance counter stats for 'system wide': 1,750,335 slots 4,243,557 tsc 0.001456717 seconds time elapsed ``` After: ``` $ perf stat -e '{slots,tsc}' -a true Performance counter stats for 'system wide': 12,526,380 slots 3,415,163 tsc 0.001488360 seconds time elapsed ``` Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240124234200.1510417-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf record: Check conflict between '--timestamp-filename' option and pipe ↵Yang Jihong2024-03-262-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mode before recording [ Upstream commit 02f9b50e04812782fd006ed21c6da1c5e3e373da ] In pipe mode, no need to switch perf data output, therefore, '--timestamp-filename' option should not take effect. Check the conflict before recording and output WARNING. In this case, the check pipe mode in perf_data__switch() can be removed. Before: # perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1 # To display the perf.data header info, please use --header/--header-only options. # [ perf record: Woken up 1 times to write data ] [ perf record: Dump -.2024011812110182 ] # # Total Lost Samples: 0 # # Samples: 4K of event 'cycles:P' # Event count (approx.): 2176784359 # # Overhead Command Shared Object Symbol # ........ ....... .................... ...................................... # 97.83% perf perf [.] noploop # # (Tip: Print event counts in CSV format with: perf stat -x,) # After: # perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1 WARNING: --timestamp-filename option is not available in pipe mode. # To display the perf.data header info, please use --header/--header-only options. # [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # # Total Lost Samples: 0 # # Samples: 4K of event 'cycles:P' # Event count (approx.): 2185575421 # # Overhead Command Shared Object Symbol # ........ ....... ..................... ............................................. # 97.75% perf perf [.] noploop # # (Tip: Profiling branch (mis)predictions with: perf record -b / perf report) # Fixes: ecfd7a9c044e ("perf record: Add '--timestamp-filename' option to append timestamp to output file name") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240119040304.3708522-3-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf top: Uniform the event name for the hybrid machineKan Liang2024-03-264-27/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a61f89bf76ef6f87ec48dd90dbc73a6cf9952edc ] It's hard to distinguish the default cycles events among hybrid PMUs. For example, $ perf top Available samples 385 cycles:P 903 cycles:P The other tool, e.g., perf record, uniforms the event name and adds the hybrid PMU name before opening the event. So the events can be easily distinguished. Apply the same methodology for the perf top as well. The evlist__uniquify_name() will be invoked by both record and top. Move it to util/evlist.c With the patch: $ perf top Available samples 148 cpu_atom/cycles:P/ 1K cpu_core/cycles:P/ Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hector Martin <marcan@marcan.st> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231214144612.1092028-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Stable-dep-of: 02f9b50e0481 ("perf record: Check conflict between '--timestamp-filename' option and pipe mode before recording") Signed-off-by: Sasha Levin <sashal@kernel.org>
* perf record: Fix possible incorrect free in record__switch_output()Yang Jihong2024-03-261-1/+1
| | | | | | | | | | | | | | | [ Upstream commit aff10a165201f6f60cff225083ce301ad3f5d8f1 ] perf_data__switch() may not assign a legal value to 'new_filename'. In this case, 'new_filename' uses the on-stack value, which may cause a incorrect free and unexpected result. Fixes: 03724b2e9c45 ("perf record: Allow to limit number of reported perf.data files") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240119040304.3708522-2-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocksJosh Poimboeuf2024-03-261-0/+12
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 10b4c4bce3f5541f54bcc2039720b11d2bc96d79 ] If SAVE and RESTORE unwind hints are in different basic blocks, and objtool sees the RESTORE before the SAVE, it errors out with: vmlinux.o: warning: objtool: vmw_port_hb_in+0x242: objtool isn't smart enough to handle this CFI save/restore combo In such a case, defer following the RESTORE block until the straight-line path gets followed later. Fixes: 8faea26e6111 ("objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402240702.zJFNmahW-lkp@intel.com/ Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20240227073527.avcm5naavbv3cj5s@treble Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: forwarding: Add missing multicast routing config entriesIdo Schimmel2024-03-261-0/+7
| | | | | | | | | | | | | | | | | [ Upstream commit f0ddf15f0a74c27eb4b2271a90e69948acc3fa2c ] The two tests that make use of multicast routig (router.sh and router_multicast.sh) are currently failing in the netdev CI because the kernel is missing multicast routing support. Fix by adding the required config entries. Fixes: 6d4efada3b82 ("selftests: forwarding: Add multicast routing test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208165538.1303021-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: forwarding: Add missing config entriesPetr Machata2024-03-261-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4acf4e62cd572b0c806035046b3698f5585ab821 ] The config file contains a partial kernel configuration to be used by `virtme-configkernel --custom'. The presumption is that the config file contains all Kconfig options needed by the selftests from the directory. In net/forwarding/config, many are missing, which manifests as spurious failures when running the selftests, with messages about unknown device types, qdisc kinds or classifier actions. Add the missing configurations. Tested the resulting configuration using virtme-ng as follows: # vng -b -f tools/testing/selftests/net/forwarding/config # vng --user root (within the VM:) # make -C tools/testing/selftests TARGETS=net/forwarding run_tests Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/025abded7ff9cea5874a7fe35dcd3fd41bf5e6ac.1706286755.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: f0ddf15f0a74 ("selftests: forwarding: Add missing multicast routing config entries") Signed-off-by: Sasha Levin <sashal@kernel.org>
* tools/resolve_btfids: Fix cross-compilation to non-host endiannessViktor Malik2024-03-261-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 903fad4394666bc23975c93fb58f137ce64b5192 ] The .BTF_ids section is pre-filled with zeroed BTF ID entries during the build and afterwards patched by resolve_btfids with correct values. Since resolve_btfids always writes in host-native endianness, it relies on libelf to do the translation when the target ELF is cross-compiled to a different endianness (this was introduced in commit 61e8aeda9398 ("bpf: Fix libelf endian handling in resolv_btfids")). Unfortunately, the translation will corrupt the flags fields of SET8 entries because these were written during vmlinux compilation and are in the correct endianness already. This will lead to numerous selftests failures such as: $ sudo ./test_verifier 502 502 #502/p sleepable fentry accept FAIL Failed to load prog 'Invalid argument'! bpf_fentry_test1 is not sleepable verification time 34 usec stack depth 0 processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Summary: 0 PASSED, 0 SKIPPED, 1 FAILED Since it's not possible to instruct libelf to translate just certain values, let's manually bswap the flags (both global and entry flags) in resolve_btfids when needed, so that libelf then translates everything correctly. Fixes: ef2c6f370a63 ("tools/resolve_btfids: Add support for 8-byte BTF sets") Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* tools/resolve_btfids: Refactor set sorting with types from btf_ids.hViktor Malik2024-03-262-14/+30
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9707ac4fe2f5bac6406d2403f8b8a64d7b3d8e43 ] Instead of using magic offsets to access BTF ID set data, leverage types from btf_ids.h (btf_id_set and btf_id_set8) which define the actual layout of the data. Thanks to this change, set sorting should also continue working if the layout changes. This requires to sync the definition of 'struct btf_id_set8' from include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync the rest of the file at the moment, b/c that would require to also sync multiple dependent headers and we don't need any other defs from btf_ids.h. Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com Stable-dep-of: 903fad439466 ("tools/resolve_btfids: Fix cross-compilation to non-host endianness") Signed-off-by: Sasha Levin <sashal@kernel.org>
* libbpf: Use OPTS_SET() macro in bpf_xdp_query()Toke Høiland-Jørgensen2024-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 92a871ab9fa59a74d013bc04f321026a057618e7 ] When the feature_flags and xdp_zc_max_segs fields were added to the libbpf bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro. This causes libbpf to write to those fields unconditionally, which means that programs compiled against an older version of libbpf (with a smaller size of the bpf_xdp_query_opts struct) will have its stack corrupted by libbpf writing out of bounds. The patch adding the feature_flags field has an early bail out if the feature_flags field is not part of the opts struct (via the OPTS_HAS) macro, but the patch adding xdp_zc_max_segs does not. For consistency, this fix just changes the assignments to both fields to use the OPTS_SET() macro. Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim APIAndrii Nakryiko2024-03-261-1/+1
| | | | | | | | | | | | | | | [ Upstream commit 93ee1eb85e28d1e35bb059c1f5965d65d5fc83c2 ] LIBBPF_API annotation seems missing on libbpf_set_memlock_rlim API, so add it to make this API callable from libbpf's shared library version. Fixes: e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF") Fixes: ab9a5a05dc48 ("libbpf: fix up few libbpf.map problems") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240201172027.604869-3-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Disable IPv6 for lwt_redirect testManu Bretelle2024-03-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2ef61296d2844c6a4211e07ab70ef2fb412b2c30 ] After a recent change in the vmtest runner, this test started failing sporadically. Investigation showed that this test was subject to race condition which got exacerbated after the vm runner change. The symptoms being that the logic that waited for an ICMPv4 packet is naive and will break if 5 or more non-ICMPv4 packets make it to tap0. When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6 router solicitation... On a system with good performance, the expected ICMPv4 packet would very likely make it to the network interface promptly, but on a system with poor performance, those "guarantees" do not hold true anymore. Given that the test is IPv4 only, this change disable IPv6 in the test netns by setting `net.ipv6.conf.all.disable_ipv6` to 1. This essentially leaves "ping" as the sole generator of traffic in the network namespace. If this test was to be made IPv6 compatible, the logic in `wait_for_packet` would need to be modified. In more details... At a high level, the test does: - create a new namespace - in `setup_redirect_target` set up lo, tap0, and link_err interfaces as well as add 2 routes that attaches ingress/egress sections of `test_lwt_redirect.bpf.o` to the xmit path. - in `send_and_capture_test_packets` send an ICMP packet and read off the tap interface (using `wait_for_packet`) to check that a ICMP packet with the right size is read. `wait_for_packet` will try to read `max_retry` (5) times from the tap0 fd looking for an ICMPv4 packet matching some criteria. The problem is that when we set up the `tap0` interface, because IPv6 is enabled by default, traffic such as Router solicitation is sent through tap0, as in: # tcpdump -r /tmp/lwt_redirect.pc reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet) 04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32 04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108 04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16 04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16 If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in: 2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec 2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec 2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec 2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec 2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec 2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec 2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec 2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec 2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec 2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec 2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec 2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec 2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec 2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec 2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec 2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec 2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec 2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec 2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec 2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec 2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1 2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails 2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec When running in a VM which potential resource contrains, the odds that calling `ping` is not scheduled very soon after bringing `tap0` up increases, and with this the chances to get our ICMP packet pushed to position 6+ in the network trace. To confirm this indeed solves the issue, I ran the test 100 times in a row with: errors=0 successes=0 for i in `seq 1 100` do ./test_progs -t lwt_redirect/lwt_redirect_normal if [ $? -eq 0 ]; then successes=$((successes+1)) else errors=$((errors+1)) fi done echo "successes: $successes/errors: $errors" While this test would at least fail a couple of time every 10 runs, here it ran 100 times with no error. Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT") Signed-off-by: Manu Bretelle <chantr4@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240131053212.2247527-1-chantr4@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* libbpf: Fix faccessat() usage on AndroidAndrii Nakryiko2024-03-261-0/+14
| | | | | | | | | | | | | | | | | | | | [ Upstream commit ad57654053805bf9a62602aaec74cc78edb6f235 ] Android implementation of libc errors out with -EINVAL in faccessat() if passed AT_EACCESS ([0]), this leads to ridiculous issue with libbpf refusing to load /sys/kernel/btf/vmlinux on Androids ([1]). Fix by detecting Android and redefining AT_EACCESS to 0, it's equivalent on Android. [0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50 [1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250 Fixes: 6a4ab8869d0b ("libbpf: Fix the case of running as non-root with capabilities") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240126220944.2497665-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Wait for the netstamp_needed_key static key to be turned onMartin KaFai Lau2024-03-261-4/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ce6f6cffaeaa0a3bcdafcae7fe03c68c3afae631 ] After the previous patch that speeded up the test (by avoiding neigh discovery in IPv6), the BPF CI occasionally hits this error: rcv tstamp unexpected pkt rcv tstamp: actual 0 == expected 0 The test complains about the cmsg returned from the recvmsg() does not have the rcv timestamp. Setting skb->tstamp or not is controlled by a kernel static key "netstamp_needed_key". The static key is enabled whenever this is at least one sk with the SOCK_TIMESTAMP set. The test_redirect_dtime does use setsockopt() to turn on the SOCK_TIMESTAMP for the reading sk. In the kernel net_enable_timestamp() has a delay to enable the "netstamp_needed_key" when CONFIG_JUMP_LABEL is set. This potential delay is the likely reason for packet missing rcv timestamp occasionally. This patch is to create udp sockets with SOCK_TIMESTAMP set. It sends and receives some packets until the received packet has a rcv timestamp. It currently retries at most 5 times with 1s in between. This should be enough to wait for the "netstamp_needed_key". It then holds on to the socket and only closes it at the end of the test. This guarantees that the test has the "netstamp_needed_key" key turned on from the beginning. To simplify the udp sockets setup, they are sending/receiving packets in the same netns (ns_dst is used) and communicate over the "lo" dev. Hence, the patch enables the "lo" dev in the ns_dst. Fixes: c803475fd8dd ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240120060518.3604920-2-martin.lau@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Fix the flaky tc_redirect_dtime testMartin KaFai Lau2024-03-261-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 177f1d083a19af58f4b1206d299ed73689249fd8 ] BPF CI has been reporting the tc_redirect_dtime test failing from time to time: test_inet_dtime:PASS:setns src 0 nsec (network_helpers.c:253: errno: No route to host) Failed to connect to server close_netns:PASS:setns 0 nsec test_inet_dtime:FAIL:connect_to_fd unexpected connect_to_fd: actual -1 < expected 0 test_tcp_clear_dtime:PASS:tcp ip6 clear dtime ingress_fwdns_p100 0 nsec The connect_to_fd failure (EHOSTUNREACH) is from the test_tcp_clear_dtime() test and it is the very first IPv6 traffic after setting up all the links, addresses, and routes. The symptom is this first connect() is always slow. In my setup, it could take ~3s. After some tracing and tcpdump, the slowness is mostly spent in the neighbor solicitation in the "ns_fwd" namespace while the "ns_src" and "ns_dst" are fine. I forced the kernel to drop the neighbor solicitation messages. I can then reproduce EHOSTUNREACH. What actually happen could be: - the neighbor advertisement came back a little slow. - the "ns_fwd" namespace concluded a neighbor discovery failure and triggered the ndisc_error_report() => ip6_link_failure() => icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0) - the client's connect() reports EHOSTUNREACH after receiving the ICMPV6_DEST_UNREACH message. The neigh table of both "ns_src" and "ns_dst" namespace has already been manually populated but not the "ns_fwd" namespace. This patch fixes it by manually populating the neigh table also in the "ns_fwd" namespace. Although the namespace configuration part had been existed before the tc_redirect_dtime test, still Fixes-tagging the patch when the tc_redirect_dtime test was added since it is the only test hitting it so far. Fixes: c803475fd8dd ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240120060518.3604920-1-martin.lau@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Add netkit to tc_redirect selftestDaniel Borkmann2024-03-261-0/+52
| | | | | | | | | | | | | | | | [ Upstream commit adfeae2d243d9e5b83d094af481d189156b11779 ] Extend the existing tc_redirect selftest to also cover netkit devices for exercising the bpf_redirect_peer() code paths, so that we have both veth as well as netkit covered, all tests still pass after this change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20231114004220.6495-9-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Stable-dep-of: 177f1d083a19 ("selftests/bpf: Fix the flaky tc_redirect_dtime test") Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: De-veth-ize the tc_redirect test caseDaniel Borkmann2024-03-261-126/+137
| | | | | | | | | | | | | | | | [ Upstream commit eee82da79f036bb49ff80d3088b9530e3c2e57eb ] No functional changes to the test case, but just renaming various functions, variables, etc, to remove veth part of their name for making it more generic and reusable later on (e.g. for netkit). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20231114004220.6495-8-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Stable-dep-of: 177f1d083a19 ("selftests/bpf: Fix the flaky tc_redirect_dtime test") Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftest/bpf: Add map_in_maps with BPF_MAP_TYPE_PERF_EVENT_ARRAY valuesAndrey Grafin2024-03-262-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 40628f9fff73adecac77a9aa390f8016724cad99 ] Check that bpf_object__load() successfully creates map_in_maps with BPF_MAP_TYPE_PERF_EVENT_ARRAY values. These changes cover fix in the previous patch "libbpf: Apply map_set_def_max_entries() for inner_maps on creation". A command line output is: - w/o fix $ sudo ./test_maps libbpf: map 'mim_array_pe': failed to create inner map: -22 libbpf: map 'mim_array_pe': failed to create: Invalid argument(-22) libbpf: failed to load object './test_map_in_map.bpf.o' Failed to load test prog - with fix $ sudo ./test_maps ... test_maps: OK, 0 SKIPPED Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support") Signed-off-by: Andrey Grafin <conquistador@yandex-team.ru> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20240117130619.9403-2-conquistador@yandex-team.ru Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* libbpf: Apply map_set_def_max_entries() for inner_maps on creationAndrey Grafin2024-03-261-0/+4
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit f04deb90e516e8e48bf8693397529bc942a9e80b ] This patch allows to auto create BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS with values of BPF_MAP_TYPE_PERF_EVENT_ARRAY by bpf_object__load(). Previous behaviour created a zero filled btf_map_def for inner maps and tried to use it for a map creation but the linux kernel forbids to create a BPF_MAP_TYPE_PERF_EVENT_ARRAY map with max_entries=0. Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support") Signed-off-by: Andrey Grafin <conquistador@yandex-team.ru> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20240117130619.9403-1-conquistador@yandex-team.ru Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests/bpf: Fix potential premature unload in bpf_testmodArtem Savkov2024-03-261-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d177c1be06ce28aa8c8710ac55be1b5ad3f314c6 ] It is possible for bpf_kfunc_call_test_release() to be called from bpf_map_free_deferred() when bpf_testmod is already unloaded and perf_test_stuct.cnt which it tries to decrease is no longer in memory. This patch tries to fix the issue by waiting for all references to be dropped in bpf_testmod_exit(). The issue can be triggered by running 'test_progs -t map_kptr' in 6.5, but is obscured in 6.6 by d119357d07435 ("rcu-tasks: Treat only synchronous grace periods urgently"). Fixes: 65eb006d85a2 ("bpf: Move kernel test kfuncs to bpf_testmod") Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/82f55c0e-0ec8-4fe1-8d8c-b1de07558ad9@linux.dev Link: https://lore.kernel.org/bpf/20240110085737.8895-1-asavkov@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* bpftool: Silence build warning about calloc()Tiezhu Yang2024-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f5f30386c78105cba520e443a6a9ee945ec1d066 ] There exists the following warning when building bpftool: CC prog.o prog.c: In function ‘profile_open_perf_events’: prog.c:2301:24: warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Wcalloc-transposed-args] 2301 | sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric); | ^~~ prog.c:2301:24: note: earlier argument should specify number of elements, later size of each element Tested with the latest upstream GCC which contains a new warning option -Wcalloc-transposed-args. The first argument to calloc is documented to be number of elements in array, while the second argument is size of each element, just switch the first and second arguments of calloc() to silence the build warning, compile tested only. Fixes: 47c09d6a9f67 ("bpftool: Introduce "prog profile" command") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240116061920.31172-1-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* selftests: tls: use exact comparison in recv_partialJakub Kicinski2024-03-261-4/+4
| | | | | | | | | | | | [ Upstream commit 49d821064c44cb5ffdf272905236012ea9ce50e3 ] This exact case was fail for async crypto and we weren't catching it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>