| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf, can and netfilter.
A little larger than usual but it's all fixes, no late features. It's
large partially because of timing, and partially because of follow ups
to stuff that got merged a week or so before the merge window and
wasn't as widely tested. Maybe the Bluetooth fixes are a little
alarming so we'll address that, but the rest seems okay and not scary.
Notably we're including a fix for the netfilter Kconfig [1], your WiFi
warning [2] and a bluetooth fix which should unblock syzbot [3].
Current release - regressions:
- Bluetooth:
- don't try to cancel uninitialized works [3]
- L2CAP: fix use-after-free caused by l2cap_chan_put
- tls: rx: fix device offload after recent rework
- devlink: fix UAF on failed reload and leftover locks in mlxsw
Current release - new code bugs:
- netfilter:
- flowtable: fix incorrect Kconfig dependencies [1]
- nf_tables: fix crash when nf_trace is enabled
- bpf:
- use proper target btf when exporting attach_btf_obj_id
- arm64: fixes for bpf trampoline support
- Bluetooth:
- ISO: unlock on error path in iso_sock_setsockopt()
- ISO: fix info leak in iso_sock_getsockopt()
- ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP
- ISO: fix memory corruption on iso_pinfo.base
- ISO: fix not using the correct QoS
- hci_conn: fix updating ISO QoS PHY
- phy: dp83867: fix get nvmem cell fail
Previous releases - regressions:
- wifi: cfg80211: fix validating BSS pointers in
__cfg80211_connect_result [2]
- atm: bring back zatm uAPI after ATM had been removed
- properly fix old bug making bonding ARP monitor mode not being able
to work with software devices with lockless Tx
- tap: fix null-deref on skb->dev in dev_parse_header_protocol
- revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some
devices and breaks others
- netfilter:
- nf_tables: many fixes rejecting cross-object linking which may
lead to UAFs
- nf_tables: fix null deref due to zeroed list head
- nf_tables: validate variable length element extension
- bgmac: fix a BUG triggered by wrong bytes_compl
- bcmgenet: indicate MAC is in charge of PHY PM
Previous releases - always broken:
- bpf:
- fix bad pointer deref in bpf_sys_bpf() injected via test infra
- disallow non-builtin bpf programs calling the prog_run command
- don't reinit map value in prealloc_lru_pop
- fix UAFs during the read of map iterator fd
- fix invalidity check for values in sk local storage map
- reject sleepable program for non-resched map iterator
- mptcp:
- move subflow cleanup in mptcp_destroy_common()
- do not queue data on closed subflows
- virtio_net: fix memory leak inside XDP_TX with mergeable
- vsock: fix memory leak when multiple threads try to connect()
- rework sk_user_data sharing to prevent psock leaks
- geneve: fix TOS inheriting for ipv4
- tunnels & drivers: do not use RT_TOS for IPv6 flowlabel
- phy: c45 baset1: do not skip aneg configuration if clock role is
not specified
- rose: avoid overflow when /proc displays timer information
- x25: fix call timeouts in blocking connects
- can: mcp251x: fix race condition on receive interrupt
- can: j1939:
- replace user-reachable WARN_ON_ONCE() with netdev_warn_once()
- fix memory leak of skbs in j1939_session_destroy()
Misc:
- docs: bpf: clarify that many things are not uAPI
- seg6: initialize induction variable to first valid array index (to
silence clang vs objtool warning)
- can: ems_usb: fix clang 14's -Wunaligned-access warning"
* tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits)
net: atm: bring back zatm uAPI
dpaa2-eth: trace the allocated address instead of page struct
net: add missing kdoc for struct genl_multicast_group::flags
nfp: fix use-after-free in area_cache_get()
MAINTAINERS: use my korg address for mt7601u
mlxsw: minimal: Fix deadlock in ports creation
bonding: fix reference count leak in balance-alb mode
net: usb: qmi_wwan: Add support for Cinterion MV32
bpf: Shut up kern_sys_bpf warning.
net/tls: Use RCU API to access tls_ctx->netdev
tls: rx: device: don't try to copy too much on detach
tls: rx: device: bound the frag walk
net_sched: cls_route: remove from list when handle is 0
selftests: forwarding: Fix failing tests with old libnet
net: refactor bpf_sk_reuseport_detach()
net: fix refcount bug in sk_psock_get (2)
selftests/bpf: Ensure sleepable program is rejected by hash map iter
selftests/bpf: Add write tests for sk local storage map iterator
selftests/bpf: Add tests for reading a dangling map iter fd
bpf: Only allow sleepable program for resched-able iterator
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The custom multipath hash tests use mausezahn in order to test how
changes in various packet fields affect the packet distribution across
the available nexthops.
The tool uses the libnet library for various low-level packet
construction and injection. The library started using the
"SO_BINDTODEVICE" socket option for IPv6 sockets in version 1.1.6 and
for IPv4 sockets in version 1.2.
When the option is not set, packets are not routed according to the
table associated with the VRF master device and tests fail.
Fix this by prefixing the command with "ip vrf exec", which will cause
the route lookup to occur in the VRF routing table. This makes the tests
pass regardless of the libnet library version.
Fixes: 511e8db54036 ("selftests: forwarding: Add test for custom multipath hash")
Fixes: 185b0c190bb6 ("selftests: forwarding: Add test for custom multipath hash with IPv4 GRE")
Fixes: b7715acba4d3 ("selftests: forwarding: Add test for custom multipath hash with IPv6 GRE")
Reported-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Link: https://lore.kernel.org/r/20220809113320.751413-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Daniel Borkmann says:
====================
bpf 2022-08-10
We've added 23 non-merge commits during the last 7 day(s) which contain
a total of 19 files changed, 424 insertions(+), 35 deletions(-).
The main changes are:
1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao.
2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia.
3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu.
4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev.
5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney.
6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi.
7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa.
8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai.
9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address
offset buffer, from Aijun Sun.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
selftests/bpf: Ensure sleepable program is rejected by hash map iter
selftests/bpf: Add write tests for sk local storage map iterator
selftests/bpf: Add tests for reading a dangling map iter fd
bpf: Only allow sleepable program for resched-able iterator
bpf: Check the validity of max_rdwr_access for sock local storage map iterator
bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator
bpf: Acquire map uref in .init_seq_private for sock local storage map iterator
bpf: Acquire map uref in .init_seq_private for hash map iterator
bpf: Acquire map uref in .init_seq_private for array map iterator
bpf: Disallow bpf programs call prog_run command.
bpf, arm64: Fix bpf trampoline instruction endianness
selftests/bpf: Add test for prealloc_lru_pop bug
bpf: Don't reinit map value in prealloc_lru_pop
bpf: Allow calling bpf_prog_test kfuncs in tracing programs
bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc
selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf
bpf: Use proper target btf when exporting attach_btf_obj_id
mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled
bpf: Cleanup ftrace hash in bpf_trampoline_put
BPF: Fix potential bad pointer dereference in bpf_sys_bpf()
...
====================
Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a test to ensure sleepable program is rejected by hash map iterator.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-10-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add test to validate the overwrite of sock local storage map value in
map iterator and another one to ensure out-of-bound value writing is
rejected.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-9-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After closing both related link fd and map fd, reading the map
iterator fd to ensure it is OK to do so.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The verifier cannot perform sufficient validation of bpf_attr->test.ctx_in
pointer, therefore bpf programs should not be allowed to call BPF_PROG_RUN
command from within the program.
To fix this issue split bpf_sys_bpf() bpf helper into normal kern_sys_bpf()
kernel function that can only be used by the kernel light skeleton directly.
Reported-by: YiFei Zhu <zhuyifei@google.com>
Fixes: b1d18a7574d0 ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a regression test to check against invalid check_and_init_map_value
call inside prealloc_lru_pop.
The kptr should not be reset to NULL once we set it after deleting the
map element. Hence, we trigger a program that updates the element
causing its reuse, and checks whether the unref kptr is reset or not.
If it is, prealloc_lru_pop does an incorrect check_and_init_map_value
call and the test fails.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220809213033.24147-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Apparently, no existing selftest covers it. Add a new one where
we load cgroup/bind4 program and attach fentry to it. Calling
bpf_obj_get_info_by_fd on the fentry program should return non-zero
btf_id/btf_obj_id instead of crashing the kernel.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220804201140.1340684-2-sdf@google.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Enable/disable tracing infrastructure while packets are in-flight.
This triggers KASAN splat after
e34b9ed96ce3 ("netfilter: nf_tables: avoid skb access on nf_stolen").
While at it, reduce script run time as well.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Few test cases related to the fix for 924a9bc362a5:
"net: check if protocol extracted by virtio_net_hdr_set_proto is correct"
Need test for the case when a non-standard packet (GSO without NEEDS_CSUM)
sent to the tap device causes a BUG check in the tap driver.
Signed-off-by: Cezar Bulinaru <cbulinaru@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the selftest got added, sendfile() on mptcp sockets returned
-EOPNOTSUPP, so running 'mptcp_connect.sh -m sendfile' failed
immediately.
This is no longer the case, but the script fails anyway due to timeout.
Let the receiver know once the sender has sent all data, just like
with '-m mmap' mode.
v2: need to respect cfg_wait too, as pm_userspace.sh relied
on -m sendfile to keep the connection open (Mat Martineau)
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Reported-by: Xiumei Mu <xmu@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull more kvm updates from Paolo Bonzini:
- Xen timer fixes
- Documentation formatting fixes
- Make rseq selftest compatible with glibc-2.35
- Fix handling of illegal LEA reg, reg
- Cleanup creation of debugfs entries
- Fix steal time cache handling bug
- Fixes for MMIO caching
- Optimize computation of number of LBRs
- Fix uninitialized field in guest_maxphyaddr < host_maxphyaddr path
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table
Documentation: KVM: extend KVM_CAP_VM_DISABLE_NX_HUGE_PAGES heading underline
KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refresh
KVM: VMX: Use proper type-safe functions for vCPU => LBRs helpers
KVM: x86: Refresh PMU after writes to MSR_IA32_PERF_CAPABILITIES
KVM: selftests: Test all possible "invalid" PERF_CAPABILITIES.LBR_FMT vals
KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test
KVM: selftests: Make rseq compatible with glibc-2.35
KVM: Actually create debugfs in kvm_create_vm()
KVM: Pass the name of the VM fd to kvm_create_vm_debugfs()
KVM: Get an fd before creating the VM
KVM: Shove vcpu stats_id init into kvm_vcpu_init()
KVM: Shove vm stats_id init into kvm_create_vm()
KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen
KVM: x86/mmu: rename trace function name for asynchronous page fault
KVM: x86/xen: Stop Xen timer before changing IRQ
KVM: x86/xen: Initialize Xen timer only once
KVM: SVM: Disable SEV-ES support if MMIO caching is disable
KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change
KVM: x86: Tag kvm_mmu_x86_module_init() with __init
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Test all possible input values to verify that KVM rejects all values
except the exact host value. Due to the LBR format affecting the core
functionality of LBRs, KVM can't emulate "other" formats, so even though
there are a variety of legal values, KVM should reject anything but an
exact host match.
Suggested-by: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sched_getcpu() is glibc dependent and it can simply return the CPU
ID from the registered rseq information, as Florian Weimer pointed.
In this case, it's pointless to compare the return value from
sched_getcpu() and that fetched from the registered rseq information.
Fix the issue by replacing sched_getcpu() with getcpu(), as Florian
suggested. The comments are modified accordingly by replacing
"sched_getcpu()" with "getcpu()".
Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20220810104114.6838-3-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The rseq information is registered by TLS, starting from glibc-2.35.
In this case, the test always fails due to syscall(__NR_rseq). For
example, on RHEL9.1 where upstream glibc-2.35 features are enabled
on downstream glibc-2.34, the test fails like below.
# ./rseq_test
==== Test Assertion Failure ====
rseq_test.c:60: !r
pid=112043 tid=112043 errno=22 - Invalid argument
1 0x0000000000401973: main at rseq_test.c:226
2 0x0000ffff84b6c79b: ?? ??:0
3 0x0000ffff84b6c86b: ?? ??:0
4 0x0000000000401b6f: _start at ??:?
rseq failed, errno = 22 (Invalid argument)
# rpm -aq | grep glibc-2
glibc-2.34-39.el9.aarch64
Fix the issue by using "../rseq/rseq.c" to fetch the rseq information,
registred by TLS if it exists. Otherwise, we're going to register our
own rseq information as before.
Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20220810104114.6838-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit 49de12ba06ef ("selftests: drop KSFT_KHDR_INSTALL make target")
dropped from tools/testing/selftests/lib.mk the code related to KSFT_KHDR_INSTALL,
but in doing so it also dropped the definition of the ARCH variable. The ARCH
variable is used in several subdirectories, but kvm/ is the only one of these
that was using KSFT_KHDR_INSTALL.
As a result, kvm selftests cannot be built anymore:
In file included from include/x86_64/vmx.h:12,
from x86_64/vmx_pmu_caps_test.c:18:
include/x86_64/processor.h:15:10: fatal error: asm/msr-index.h: No such file or directory
15 | #include <asm/msr-index.h>
| ^~~~~~~~~~~~~~~~~
In file included from ../../../../tools/include/asm/atomic.h:6,
from ../../../../tools/include/linux/atomic.h:5,
from rseq_test.c:15:
../../../../tools/include/asm/../../arch/x86/include/asm/atomic.h:11:10: fatal error: asm/cmpxchg.h: No such file or directory
11 | #include <asm/cmpxchg.h>
| ^~~~~~~~~~~~~~~
Fix it by including the definition that was present in lib.mk.
Fixes: 49de12ba06ef ("selftests: drop KSFT_KHDR_INSTALL make target")
Cc: Guillaume Tucker <guillaume.tucker@collabora.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull cxl updates from Dan Williams:
"Compute Express Link (CXL) updates for 6.0:
- Introduce a 'struct cxl_region' object with support for
provisioning and assembling persistent memory regions.
- Introduce alloc_free_mem_region() to accompany the existing
request_free_mem_region() as a method to allocate physical memory
capacity out of an existing resource.
- Export insert_resource_expand_to_fit() for the CXL subsystem to
late-publish CXL platform windows in iomem_resource.
- Add a polled mode PCI DOE (Data Object Exchange) driver service and
use it in cxl_pci to retrieve the CDAT (Coherent Device Attribute
Table)"
* tag 'cxl-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (74 commits)
cxl/hdm: Fix skip allocations vs multiple pmem allocations
cxl/region: Disallow region granularity != window granularity
cxl/region: Fix x1 interleave to greater than x1 interleave routing
cxl/region: Move HPA setup to cxl_region_attach()
cxl/region: Fix decoder interleave programming
Documentation: cxl: remove dangling kernel-doc reference
cxl/region: describe targets and nr_targets members of cxl_region_params
cxl/regions: add padding for cxl_rr_ep_add nested lists
cxl/region: Fix IS_ERR() vs NULL check
cxl/region: Fix region reference target accounting
cxl/region: Fix region commit uninitialized variable warning
cxl/region: Fix port setup uninitialized variable warnings
cxl/region: Stop initializing interleave granularity
cxl/hdm: Fix DPA reservation vs cxl_endpoint_decoder lifetime
cxl/acpi: Minimize granularity for x1 interleaves
cxl/region: Delete 'region' attribute from root decoders
cxl/acpi: Autoload driver for 'cxl_acpi' test devices
cxl/region: decrement ->nr_targets on error in cxl_region_attach()
cxl/region: prevent underflow in ways_to_cxl()
cxl/region: uninitialized variable in alloc_hpa()
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
After all the soft validation of the region has completed, convey the
region configuration to hardware while being careful to commit decoders
in specification mandated order. In addition to programming the endpoint
decoder base-address, interleave ways and granularity, the switch
decoder target lists are also established.
While the kernel can enforce spec-mandated commit order, it can not
enforce spec-mandated reset order. For example, the kernel can't stop
someone from removing an endpoint device that is occupying decoderN in a
switch decoder where decoderN+1 is also committed. To reset decoderN,
decoderN+1 must be torn down first. That "tear down the world"
implementation is saved for a follow-on patch.
Callback operations are provided for the 'commit' and 'reset'
operations. While those callbacks may prove useful for CXL accelerators
(Type-2 devices with memory) the primary motivation is to enable a
simple way for cxl_test to intercept those operations.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784338418.1758207.14659830845389904356.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Previously the target routing specifics of switch decoders and platform
CXL window resource tracking of root decoders were factored out of
'struct cxl_decoder'. While switch decoders translate from SPA to
downstream ports, endpoint decoders translate from SPA to DPA.
This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an
endpoint-specific Device Physical Address (DPA) resource. For now this
just defines ->dpa_res, a follow-on patch will handle requesting DPA
resource ranges from a device-DPA resource tree.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784327088.1758207.15502834501671201192.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently 'struct cxl_decoder' contains the superset of attributes
needed for all decoder types. Before more type-specific attributes are
added to the common definition, reorganize 'struct cxl_decoder' into type
specific objects.
This patch, the first of three, factors out a cxl_switch_decoder type.
See the new kdoc for what a 'struct cxl_switch_decoder' represents in a
CXL topology.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/165784325340.1758207.5064717153608954960.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The 'enabled' state is reserved for committed decoders. By default,
cxl_test decoders are uncommitted at init time.
Fixes: 7c7d68db0254 ("tools/testing/cxl: Enumerate mock decoders")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603888091.551046.6312322707378021172.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In support of testing DPA allocation mechanisms in the CXL core, the
cxl_test environment needs to support establishing and retrieving the
'pmem partition boundary.
Replace the platform_device_add_resources() method for delineating DPA
within an endpoint with an emulated DEV_SIZE amount of partitionable
capacity. Set DEV_SIZE such that an endpoint has enough capacity to
simultaneously participate in 8 distinct regions.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603887411.551046.13234212587991192347.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For the x2 host-bridge interleave windows, allow for a
x8-endpoint-interleave configuration per memory-type with each device
contributing the minimum 256MB extent. Similarly, for the x1 host-bridge
interleave windows, allow for a x4-endpoint-interleave configuration per
memory-type.
Bump up the number of decoders per-port to support hosting 8 regions.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603886721.551046.8682583835505795210.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
A recent QEMU upgrade resulted in collisions between QEMU's chosen
location for PCI MMIO and cxl_test's fake address location for emulated
CXL purposes. This was great for testing resource collisions, but not so
great for continuing to test the nominal cases. Move cxl_test to the
top-of-memory where it is less likely to collide with other resources.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603886021.551046.12395967874222763381.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
To date the per-device-partition DPA range information has only been
used for enumeration purposes. In preparation for allocating regions
from available DPA capacity, convert those ranges into DPA-type resource
trees.
With resources and the new add_dpa_res() helper some open coded end
address calculations and debug prints can be cleaned.
The 'cxlds->pmem_res' and 'cxlds->ram_res' resources are child resources
of the total-device DPA space and they in turn will host DPA allocations
from cxl_endpoint_decoder instances (tracked by cxled->dpa_res).
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603878921.551046.8127845916514734142.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In preparation for growing a ->dpa_range attribute for endpoint
decoders, rename the current ->decoder_range to the more descriptive
->hpa_range.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Link: https://lore.kernel.org/r/165603872867.551046.2170426227407458814.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This failing signature:
[ 8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760
[ 8.392670] cxl_port: probe of endpoint2 failed with error 970997760
[ 8.392719] create_endpoint: cxl_mem mem0: add: endpoint2
[ 8.392721] cxl_mem mem0: endpoint2 failed probe
[ 8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6
...shows cxl_hdm_decode_init() resulting in a return code ("970997760")
that looks like stack corruption. The problem goes away if
cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init().
The corruption results from the mismatch that the calling convention for
cxl_hdm_decode_init() is:
int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
...and __wrap_cxl_hdm_decode_init() is:
bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm)
...i.e. an int is expected but __wrap_hdm_decode_init() returns bool.
Fix the convention and cleanup the organization to match
__wrap_cxl_await_media_ready() as the difference was a red herring that
distracted from finding the bug.
Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Link: https://lore.kernel.org/r/165603870776.551046.8709990108936497723.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- An optimization in memblock_add_range() to reduce array traversals
- Improvements to the memblock test suite
* tag 'memblock-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock test: Modify the obsolete description in README
memblock tests: fix compilation errors
memblock tests: change build options to run-time options
memblock tests: remove completed TODO items
memblock tests: set memblock_debug to enable memblock_dbg() messages
memblock tests: add verbose output to memblock tests
memblock tests: Makefile: add arguments to control verbosity
memblock: avoid some repeat when add new range
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The VERBOSE option in Makefile has been moved, but there still have the
description left in README. For now, we use `-v` options when running
memblock test to print information, so using the new to replace the
obsolete items.
Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com>
Acked-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220729161125.190845-1-shaoqin.huang@intel.com
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Do 'make -C tools/testing/memblock', get the following errors:
memblock.o: In function `memblock_find_in_range.constprop.9':
memblock.c:(.text+0x4651): undefined reference to `pr_warn_ratelimited'
memblock.o: In function `memblock_mark_mirror':
memblock.c:(.text+0x7171): undefined reference to `mirrored_kernelcore'
Fixes: 902c2d91582c ("memblock: Disable mirror feature if kernelcore is not specified")
Fixes: 14d9a675fd0d ("mm: Ratelimited mirrored memory related warning messages")
Signed-off-by: Liu Xinpeng <liuxp11@chinatelecom.cn>
Tested-by: Ma Wupeng <mawupeng1@huawei.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/1658916453-26312-1-git-send-email-liuxp11@chinatelecom.cn
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Change verbose and movable node build options to run-time options.
Movable node usage:
$ ./main -m
Or:
$ ./main --movable-node
Verbose usage:
$ ./main -v
Or:
$ ./main --verbose
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220714031717.12258-1-remckee0@gmail.com
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Remove completed items from TODO list.
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/6a3e74fcb51a07e8d9fbbcbe84bdb8aa8b00e843.1656907314.git.remckee0@gmail.com
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If Memblock simulator was compiled with MEMBLOCK_DEBUG=1, set
memblock_debug to 1 so that memblock_dbg() will print debug information
when memblock functions are tested in Memblock simulator.
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/aee4200cce1c09992ed055006a81fde1b6b5b567.1656907314.git.remckee0@gmail.com
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add and use functions and macros for printing verbose testing output.
If the Memblock simulator was compiled with VERBOSE=1:
- prefix_push(): appends the given string to a prefix string that will be
printed in test_fail() and test_pass*().
- prefix_pop(): removes the last prefix from the prefix string.
- prefix_reset(): clears the prefix string.
- test_fail(): prints a message after a test fails containing the test
number of the failing test and the prefix.
- test_pass(): prints a message after a test passes containing its test
number and the prefix.
- test_print(): prints the given formatted output string.
- test_pass_pop(): runs test_pass() followed by prefix_pop().
- PREFIX_PUSH(): runs prefix_push(__func__).
If the Memblock simulator was not compiled with VERBOSE=1, these
functions/macros do nothing.
Add the assert wrapper macros ASSERT_EQ(), ASSERT_NE(), and ASSERT_LT().
If the assert condition fails, these macros call test_fail() before
executing assert().
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/f234d443fe154d5ae8d8aa07284aff69edfb6f61.1656907314.git.remckee0@gmail.com
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add VERBOSE and MEMBLOCK_DEBUG user-provided arguments. VERBOSE will
enable verbose output from Memblock simulator. MEMBLOCK_DEBUG will enable
memblock_dbg() messages.
Update the help message to include VERBOSE and MEMBLOCK_DEBUG. Update
the README to include VERBOSE. The README does not include all available
options and refers to the help message for the remaining options.
Therefore, omit MEMBLOCK_DEBUG from README.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/5503f3efe82ecef5c99961a1d53003c8ad06cf27.1656907314.git.remckee0@gmail.com
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 eIBRS fixes from Borislav Petkov:
"More from the CPU vulnerability nightmares front:
Intel eIBRS machines do not sufficiently mitigate against RET
mispredictions when doing a VM Exit therefore an additional RSB,
one-entry stuffing is needed"
* tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add LFENCE to RSB fill sequence
x86/speculation: Add RSB VM Exit protections
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.
== Background ==
Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.
To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced. eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.
== Problem ==
Here's a simplification of how guests are run on Linux' KVM:
void run_kvm_guest(void)
{
// Prepare to run guest
VMRESUME();
// Clean up after guest runs
}
The execution flow for that would look something like this to the
processor:
1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()
Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:
* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.
* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".
IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.
However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.
Balanced CALL/RET instruction pairs such as in step #5 are not affected.
== Solution ==
The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.
However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.
Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.
In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.
There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
[ bp: Massage, incorporate review comments from Andy Cooper. ]
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"These fix an error code path issue leading to a NULL pointer
dereference, drop Kconfig dependencies that are not needed any more
after recent changes, add CPU IDs for new chips to a driver and fix up
the tmon utility.
Specifics:
- Fix NULL pointer dereference in the thermal sysfs interface that
results from an error code path mishandling (Rafael Wysocki).
- Drop COMPILE_TEST dependency that's not needed any more from two
thermal Kconfig entries (Jean Delvare).
- Make the Intel TCC cooling driver support Alder Lake-N and Raptor
Lake-P (Sumeet Pawnikar).
- Fix possible path truncations in the tmon utility (Florian
Fainelli)"
* tag 'thermal-5.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tools/thermal: Fix possible path truncations
thermal: Drop obsolete dependency on COMPILE_TEST
thermal: sysfs: Fix cooling_device_stats_setup() error code path
thermal: intel: Add TCC cooling support for Alder Lake-N and Raptor Lake-P
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
A build with -D_FORTIFY_SOURCE=2 enabled will produce the following warnings:
sysfs.c:63:30: warning: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 255 [-Wformat-truncation=]
snprintf(filepath, 256, "%s/%s", path, filename);
^~
Bump up the buffer to PATH_MAX which is the limit and account for all of
the possible NUL and separators that could lead to exceeding the
allocated buffer sizes.
Fixes: 94f69966faf8 ("tools/thermal: Introduce tmon, a tool for thermal subsystem")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Pull bitmap updates from Yury Norov:
- fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo)
- optimize out non-atomic bitops on compile-time constants (Alexander
Lobakin)
- cleanup bitmap-related headers (Yury Norov)
- x86/olpc: fix 'logical not is only applied to the left hand side'
(Alexander Lobakin)
- lib/nodemask: inline wrappers around bitmap (Yury Norov)
* tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits)
lib/nodemask: inline next_node_in() and node_random()
powerpc: drop dependency on <asm/machdep.h> in archrandom.h
x86/olpc: fix 'logical not is only applied to the left hand side'
lib/cpumask: move some one-line wrappers to header file
headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure
headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>
headers/deps: mm: Optimize <linux/gfp.h> header dependencies
lib/cpumask: move trivial wrappers around find_bit to the header
lib/cpumask: change return types to unsigned where appropriate
cpumask: change return types to bool where appropriate
lib/bitmap: change type of bitmap_weight to unsigned long
lib/bitmap: change return types to bool where appropriate
arm: align find_bit declarations with generic kernel
iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
lib/test_bitmap: test the tail after bitmap_to_arr64()
lib/bitmap: fix off-by-one in bitmap_to_arr64()
lib: test_bitmap: add compile-time optimization/evaluations assertions
bitmap: don't assume compiler evaluates small mem*() builtins calls
net/ice: fix initializing the bitmap in the switch code
bitops: let optimize out non-atomic bitops on compile-time constants
...
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
bitmap_weight() doesn't return negative values, so change it's type
to unsigned long. It may help compiler to generate better code and
catch bugs.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Some bitmap functions return boolean results in int variables. Fix it
by changing return types to bool.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
In preparation for altering the non-atomic bitops with a macro, wrap
them in a transparent definition. This requires prepending one more
'_' to their names in order to be able to do that seamlessly. It is
a simple change, given that all the non-prefixed definitions are now
in asm-generic.
sparc32 already has several triple-underscored functions, so I had
to rename them ('___' -> 'sp32_').
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
| | |_|_|/ / /
| |/| | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:
ret bool, int, unsigned long
nr int, long, unsigned int, unsigned long
addr volatile unsigned long *, volatile void *
Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:
ret bool retval can be only 0 or 1
nr unsigned long native; signed makes no sense
addr volatile unsigned long * bitmaps are arrays of ulongs
Next, some architectures don't define 'arch_' versions as they don't
support instrumentation, others do. To make sure there is always the
same set of callables present and to ease any potential future
changes, make them all follow the rule:
* architecture-specific files define only 'arch_' versions;
* non-prefixed versions can be defined only in asm-generic files;
and place the non-prefixed definitions into a new file in
asm-generic to be included by non-instrumented architectures.
Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently, so that
they always get resolved to the actual operations.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
material this time"
* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
scripts/gdb: ensure the absolute path is generated on initial source
MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
mailmap: add linux.dev alias for Brendan Higgins
mailmap: update Kirill's email
profile: setup_profiling_timer() is moslty not implemented
ocfs2: fix a typo in a comment
ocfs2: use the bitmap API to simplify code
ocfs2: remove some useless functions
lib/mpi: fix typo 'the the' in comment
proc: add some (hopefully) insightful comments
bdi: remove enum wb_congested_state
kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
squashfs: support reading fragments in readahead call
squashfs: implement readahead
squashfs: always build "file direct" version of page actor
Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
fs/ocfs2: Fix spelling typo in comment
ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
proc: fix test for "vsyscall=xonly" boot option
...
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Booting with vsyscall=xonly results in the following vsyscall VMA:
ffffffffff600000-ffffffffff601000 --xp ... [vsyscall]
Test does read from fixed vsyscall address to determine if kernel
supports vsyscall page but it doesn't work because, well, vsyscall
page is execute only.
Fix test by trying to execute from the first byte of the page which
contains gettimeofday() stub. This should work because vsyscall
entry points have stable addresses by design.
Alexey, avoiding parsing .config, /proc/config.gz and
/proc/cmdline at all costs.
Link: https://lkml.kernel.org/r/Ys2KgeiEMboU8Ytu@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: <dylanbhatch@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
| |\ \ \ \ \ \ \
| | | |_|_|/ / /
| | |/| | | | | |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Add a test for the renameat2 RENAME_EXCHANGE support in vfat, but split it
in a tool that just does the rename exchange and a script that is run by
the kselftests framework on `make TARGETS="filesystems/fat" kselftest`.
That way the script can be easily extended to test other file operations.
The script creates a 1 MiB disk image, that is then formated with a vfat
filesystem and mounted using a loop device. That way all file operations
are done on an ephemeral filesystem.
Link: https://lkml.kernel.org/r/20220610075721.1182745-5-javierm@redhat.com
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Alexander Larsson <alexl@redhat.com>
Cc: Christian Kellner <ckellner@redhat.com>
Cc: Chung-Chiang Cheng <cccheng@synology.com>
Cc: Colin Walters <walters@verbum.org>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|