summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooksSean Christopherson2022-01-194-32/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop kvm_x86_ops' pre/post_block() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: x86: Unexport LAPIC's switch_to_{hv,sw}_timer() helpersSean Christopherson2022-01-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unexport switch_to_{hv,sw}_timer() now that common x86 handles the transitions. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: VMX: Move preemption timer <=> hrtimer dance to common x86Sean Christopherson2022-01-192-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle the switch to/from the hypervisor/software timer when a vCPU is blocking in common x86 instead of in VMX. Even though VMX is the only user of a hypervisor timer, the logic and all functions involved are generic x86 (unless future CPUs do something completely different and implement a hypervisor timer that runs regardless of mode). Handling the switch in common x86 will allow for the elimination of the pre/post_blocks hooks, and also lets KVM switch back to the hypervisor timer if and only if it was in use (without additional params). Add a comment explaining why the switch cannot be deferred to kvm_sched_out() or kvm_vcpu_block(). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: Move x86 VMX's posted interrupt list_head to vcpu_vmxSean Christopherson2022-01-195-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the seemingly generic block_vcpu_list from kvm_vcpu to vcpu_vmx, and rename the list and all associated variables to clarify that it tracks the set of vCPU that need to be poked on a posted interrupt to the wakeup vector. The list is not used to track _all_ vCPUs that are blocking, and the term "blocked" can be misleading as it may refer to a blocking condition in the host or the guest, where as the PI wakeup case is specifically for the vCPUs that are actively blocking from within the guest. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: Drop unused kvm_vcpu.pre_pcpu fieldSean Christopherson2022-01-192-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove kvm_vcpu.pre_pcpu as it no longer has any users. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: VMX: Handle PI descriptor updates during vcpu_put/loadSean Christopherson2022-01-193-93/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the posted interrupt pre/post_block logic into vcpu_put/load respectively, using the kvm_vcpu_is_blocking() to determining whether or not the wakeup handler needs to be set (and unset). This avoids updating the PI descriptor if halt-polling is successful, reduces the number of touchpoints for updating the descriptor, and eliminates the confusing behavior of intentionally leaving a "stale" PI.NDST when a blocking vCPU is scheduled back in after preemption. The downside is that KVM will do the PID update twice if the vCPU is preempted after prepare_to_rcuwait() but before schedule(), but that's a rare case (and non-existent on !PREEMPT kernels). The notable wart is the need to send a self-IPI on the wakeup vector if an outstanding notification is pending after configuring the wakeup vector. Ideally, KVM would just do a kvm_vcpu_wake_up() in this case, but the scheduler doesn't support waking a task from its preemption notifier callback, i.e. while the task is right in the middle of being scheduled out. Note, setting the wakeup vector before halt-polling is not necessary: once the pending IRQ will be recorded in the PIR, kvm_vcpu_has_events() will detect this (via kvm_cpu_get_interrupt(), kvm_apic_get_interrupt(), apic_has_interrupt_for_ppr() and finally vmx_sync_pir_to_irr()) and terminate the polling. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | Merge branch 'kvm-pi-raw-spinlock' into HEADPaolo Bonzini2022-01-19879-3791/+8365
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bring in fix for VT-d posted interrupts before further changing the code in 5.17. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | | | | | | KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlockMarcelo Tosatti2022-01-181-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blocked_vcpu_on_cpu_lock is taken from hard interrupt context (pi_wakeup_handler), therefore it cannot sleep. Switch it to a raw spinlock. Fixes: [41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001 [41297.066323] Preemption disabled at: [41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60 [41297.066339] Call Trace: [41297.066342] <IRQ> [41297.066346] dump_stack_lvl+0x34/0x44 [41297.066353] ? irq_enter_rcu+0xf/0x60 [41297.066356] __schedule_bug.cold+0x7d/0x8b [41297.066361] __schedule+0x439/0x5b0 [41297.066365] ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440 [41297.066369] schedule_rtlock+0x1e/0x40 [41297.066371] rtlock_slowlock_locked+0xf1/0x260 [41297.066374] rt_spin_lock+0x3b/0x60 [41297.066378] pi_wakeup_handler+0x31/0x90 [kvm_intel] [41297.066388] sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0 [41297.066392] </IRQ> [41297.066392] asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20 ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: avoid warning on s390 in mark_page_dirtyChristian Borntraeger2022-01-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid warnings on s390 like [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1 [ 1801.980938] Workqueue: events irqfd_inject [kvm] [...] [ 1801.981057] Call Trace: [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm] [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm] [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm] [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm] [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm] [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470 [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498 [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110 [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58 [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40 when writing to a guest from an irqfd worker as long as we do not have the dirty ring. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reluctantly-acked-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20220113122924.740496-1-borntraeger@linux.ibm.com> Fixes: 2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: selftests: Add a test to force emulation with a pending exceptionSean Christopherson2022-01-193-0/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a VMX specific test to verify that KVM doesn't explode if userspace attempts KVM_RUN when emulation is required with a pending exception. KVM VMX's emulation support for !unrestricted_guest punts exceptions to userspace instead of attempting to synthesize the exception with all the correct state (and stack switching, etc...). Punting is acceptable as there's never been a request to support injecting exceptions when emulating due to invalid state, but KVM has historically assumed that userspace will do the right thing and either clear the exception or kill the guest. Deliberately do the opposite and attempt to re-enter the guest with a pending exception and emulation required to verify KVM continues to punt the combination to userspace, e.g. doesn't explode, WARN, etc... Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211228232437.1875318-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: VMX: Reject KVM_RUN if emulation is required with pending exceptionSean Christopherson2022-01-195-5/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reject KVM_RUN if emulation is required (because VMX is running without unrestricted guest) and an exception is pending, as KVM doesn't support emulating exceptions except when emulating real mode via vm86. The vCPU is hosed either way, but letting KVM_RUN proceed triggers a WARN due to the impossible condition. Alternatively, the WARN could be removed, but then userspace and/or KVM bugs would result in the vCPU silently running in a bad state, which isn't very friendly to users. Originally, the bug was hit by syzkaller with a nested guest as that doesn't require kvm_intel.unrestricted_guest=0. That particular flavor is likely fixed by commit cd0e615c49e5 ("KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required"), but it's trivial to trigger the WARN with a non-nested guest, and userspace can likely force bad state via ioctls() for a nested guest as well. Checking for the impossible condition needs to be deferred until KVM_RUN because KVM can't force specific ordering between ioctls. E.g. clearing exception.pending in KVM_SET_SREGS doesn't prevent userspace from setting it in KVM_SET_VCPU_EVENTS, and disallowing KVM_SET_VCPU_EVENTS with emulation_required would prevent userspace from queuing an exception and then stuffing sregs. Note, if KVM were to try and detect/prevent the condition prior to KVM_RUN, handle_invalid_guest_state() and/or handle_emulation_failure() would need to be modified to clear the pending exception prior to exiting to userspace. ------------[ cut here ]------------ WARNING: CPU: 6 PID: 137812 at arch/x86/kvm/vmx/vmx.c:1623 vmx_queue_exception+0x14f/0x160 [kvm_intel] CPU: 6 PID: 137812 Comm: vmx_invalid_nes Not tainted 5.15.2-7cc36c3e14ae-pop #279 Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014 RIP: 0010:vmx_queue_exception+0x14f/0x160 [kvm_intel] Code: <0f> 0b e9 fd fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 RSP: 0018:ffffa45c83577d38 EFLAGS: 00010202 RAX: 0000000000000003 RBX: 0000000080000006 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000010002 RDI: ffff9916af734000 RBP: ffff9916af734000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000006 R13: 0000000000000000 R14: ffff9916af734038 R15: 0000000000000000 FS: 00007f1e1a47c740(0000) GS:ffff99188fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1e1a6a8008 CR3: 000000026f83b005 CR4: 00000000001726e0 Call Trace: kvm_arch_vcpu_ioctl_run+0x13a2/0x1f20 [kvm] kvm_vcpu_ioctl+0x279/0x690 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-by: syzbot+82112403ace4cbd780d8@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211228232437.1875318-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | selftests: kvm/x86: Add test for KVM_SET_PMU_EVENT_FILTERJim Mattson2022-01-193-0/+440
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Verify that the PMU event filter works as expected. Note that the virtual PMU doesn't work as expected on AMD Zen CPUs (an intercepted rdmsr is counted as a retired branch instruction), but the PMU event filter does work. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-7-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | selftests: kvm/x86: Introduce x86_model()Jim Mattson2022-01-191-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract the x86 model number from CPUID.01H:EAX. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-6-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | selftests: kvm/x86: Export x86_family() for use outside of processor.cJim Mattson2022-01-192-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move this static inline function to processor.h, so that it can be used in individual tests, as needed. Opportunistically replace the bare 'unsigned' with 'unsigned int.' Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-5-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | selftests: kvm/x86: Introduce is_amd_cpu()Jim Mattson2022-01-192-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the one ad hoc "AuthenticAMD" CPUID vendor string comparison with a new function, is_amd_cpu(). Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-4-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | selftests: kvm/x86: Parameterize the CPUID vendor string checkJim Mattson2022-01-191-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor is_intel_cpu() to make it easier to reuse the bulk of the code for other vendors in the future. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-3-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86/pmu: Use binary search to check filtered eventsJim Mattson2022-01-191-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PMU event filter may contain up to 300 events. Replace the linear search in reprogram_gp_counter() with a binary search. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-2-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | kvm: selftests: conditionally build vm_xsave_req_perm()Wei Wang2022-01-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vm_xsave_req_perm() is currently defined and used by x86_64 only. Make it compiled into vm_create_with_vcpus() only when on x86_64 machines. Otherwise, it would cause linkage errors, e.g. on s390x. Fixes: 415a3c33e8 ("kvm: selftests: Add support for KVM_CAP_XSAVE2") Reported-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Tested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Message-Id: <20220118014817.30910-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86/cpuid: Clear XFD for component i if the base feature is missingLike Xu2022-01-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to Intel extended feature disable (XFD) spec, the sub-function i (i > 1) of CPUID function 0DH enumerates "details for state component i. ECX[2] enumerates support for XFD support for this state component." If KVM does not report F(XFD) feature (e.g. due to CONFIG_X86_64), then the corresponding XFD support for any state component i should also be removed. Translate this dependency into KVM terms. Fixes: 690a757d610e ("kvm: x86: Add CPUID support for Intel AMX") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220117074531.76925-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access()David Matlack2022-01-191-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains why it is safe to flush TLBs outside of the MMU lock after write-protecting SPTEs for dirty logging. The current comment is a long run-on sentence that was difficult to understand. In addition it was specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP MMU has to handle this as well. The new comment explains: - Why the TLB flush is necessary at all. - Why it is desirable to do the TLB flush outside of the MMU lock. - Why it is safe to do the TLB flush outside of the MMU lock. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220113233020.3986005-5-dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariantsDavid Matlack2022-01-191-6/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPTEs are tagged with software-only bits to indicate if it is "MMU-writable" and "Host-writable". These bits are used to determine why KVM has marked an SPTE as read-only. Document these bits and their invariants, and enforce the invariants with new WARNs in spte_can_locklessly_be_made_writable() to ensure they are not accidentally violated in the future. Opportunistically move DEFAULT_SPTE_{MMU,HOST}_WRITABLE next to EPT_SPTE_{MMU,HOST}_WRITABLE since the new documentation applies to both. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220113233020.3986005-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86/mmu: Clear MMU-writable during changed_pte notifierDavid Matlack2022-01-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling the changed_pte notifier and the new PTE is read-only, clear both the Host-writable and MMU-writable bits in the SPTE. This preserves the invariant that MMU-writable is set if-and-only-if Host-writable is set. No functional change intended. Nothing currently relies on the aforementioned invariant and technically the changed_pte notifier is dead code. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220113233020.3986005-3-dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMUDavid Matlack2022-01-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the TDP MMU is write-protection GFNs for page table protection (as opposed to for dirty logging, or due to the HVA not being writable), it checks if the SPTE is already write-protected and if so skips modifying the SPTE and the TLB flush. This behavior is incorrect because it fails to check if the SPTE is write-protected for page table protection, i.e. fails to check that MMU-writable is '0'. If the SPTE was write-protected for dirty logging but not page table protection, the SPTE could locklessly be made writable, and vCPUs could still be running with writable mappings cached in their TLB. Fix this by only skipping setting the SPTE if the SPTE is already write-protected *and* MMU-writable is already clear. Technically, checking only MMU-writable would suffice; a SPTE cannot be writable without MMU-writable being set. But check both to be paranoid and because it arguably yields more readable code. Fixes: 46044f72c382 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU") Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220113233020.3986005-2-dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86: Making the module parameter of vPMU more commonLike Xu2022-01-178-12/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new module parameter to control PMU virtualization should apply to Intel as well as AMD, for situations where userspace is not trusted. If the module parameter allows PMU virtualization, there could be a new KVM_CAP or guest CPUID bits whereby userspace can enable/disable PMU virtualization on a per-VM basis. If the module parameter does not allow PMU virtualization, there should be no userspace override, since we have no precedent for authorizing that kind of override. If it's false, other counter-based profiling features (such as LBR including the associated CPUID bits if any) will not be exposed. Change its name from "pmu" to "enable_pmu" as we have temporary variables with the same name in our code like "struct kvm_pmu *pmu". Fixes: b1d66dad65dc ("KVM: x86/svm: Add module param to control PMU virtualization") Suggested-by : Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220111073823.21885-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: selftests: Test KVM_SET_CPUID2 after KVM_RUNVitaly Kuznetsov2022-01-173-4/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM forbids KVM_SET_CPUID2 after KVM_RUN was performed on a vCPU unless the supplied CPUID data is equal to what was previously set. Test this. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220117150542.2176196-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: selftests: Rename 'get_cpuid_test' to 'cpuid_test'Vitaly Kuznetsov2022-01-173-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to reusing the existing 'get_cpuid_test' for testing "KVM_SET_CPUID{,2} after KVM_RUN" rename it to 'cpuid_test' to avoid the confusion. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220117150542.2176196-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUNVitaly Kuznetsov2022-01-172-19/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN") forbade changing CPUID altogether but unfortunately this is not fully compatible with existing VMMs. In particular, QEMU reuses vCPU fds for CPU hotplug after unplug and it calls KVM_SET_CPUID2. Instead of full ban, check whether the supplied CPUID data is equal to what was previously set. Reported-by: Igor Mammedov <imammedo@redhat.com> Fixes: feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220117150542.2176196-3-vkuznets@redhat.com> Cc: stable@vger.kernel.org [Do not call kvm_find_cpuid_entry repeatedly. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86: Do runtime CPUID update before updating vcpu->arch.cpuid_entriesVitaly Kuznetsov2022-01-171-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_update_cpuid_runtime() mangles CPUID data coming from userspace VMM after updating 'vcpu->arch.cpuid_entries', this makes it impossible to compare an update with what was previously supplied. Introduce __kvm_update_cpuid_runtime() version which can be used to tweak the input before it goes to 'vcpu->arch.cpuid_entries' so the upcoming update check can compare tweaked data. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220117150542.2176196-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES eventLike Xu2022-01-172-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to CPUID 0x0A.EBX bit vector, the event [7] should be the unrealized event "Topdown Slots" instead of the *kernel* generalized common hardware event "REF_CPU_CYCLES", so we need to skip the cpuid unavaliblity check in the intel_pmc_perf_hw_id() for the last REF_CPU_CYCLES event and update the confusing comment. If the event is marked as unavailable in the Intel guest CPUID 0AH.EBX leaf, we need to avoid any perf_event creation, whether it's a gp or fixed counter. To distinguish whether it is a rejected event or an event that needs to be programmed with PERF_TYPE_RAW type, a new special returned value of "PERF_COUNT_HW_MAX + 1" is introduced. Fixes: 62079d8a43128 ("KVM: PMU: add proper support for fixed counter 2") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220105051509.69437-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | | | | | Merge tag 'for-5.17/parisc-2' of ↵Linus Torvalds2022-01-224-4/+19
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull more parisc architecture updates from Helge Deller: "Fixes and enhancements: - a memory leak fix in an error path in pdc_stable (Miaoqian Lin) - two compiler warning fixes in the TOC code - added autodetection for currently used console type (serial or graphics) which inserts console=<type> if it's missing" * tag 'for-5.17/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries parisc: Fix missing prototype for 'toc_intr' warning in toc.c parisc: Autodetect default output device and set console= kernel parameter parisc: Use safer strscpy() in setup_cmdline() parisc: Add visible flag to toc_stack variable
| * | | | | | | | | parisc: pdc_stable: Fix memory leak in pdcs_register_pathentriesMiaoqian Lin2022-01-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kobject_init_and_add() takes reference even when it fails. According to the doc of kobject_init_and_add(): If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Fix memory leak by calling kobject_put(). Fixes: 73f368cf679b ("Kobject: change drivers/parisc/pdc_stable.c to use kobject_init_and_add") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
| * | | | | | | | | parisc: Fix missing prototype for 'toc_intr' warning in toc.cHelge Deller2022-01-202-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a missing prototype warning noticed by the kernel test robot. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Helge Deller <deller@gmx.de>
| * | | | | | | | | parisc: Autodetect default output device and set console= kernel parameterHelge Deller2022-01-201-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Usually palo (the PA-RISC boot loader) will check at boot time if the machine/firmware was configured to use the serial line (ttyS0, SERIAL_x) or the graphical display (tty0, graph) as default output device and add the correct "console=ttyS0" or "console=tty0" Linux kernel parameter to the kernel command line when starting the Linux kernel. But the kernel could also have been started via the HP-UX boot loader or directly in qemu, in which cases the console parameter is missing. This patch fixes this problem by adding the correct console= parameter if it's missing in the current kernel command line. Signed-off-by: Helge Deller <deller@gmx.de>
| * | | | | | | | | parisc: Use safer strscpy() in setup_cmdline()Helge Deller2022-01-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Helge Deller <deller@gmx.de>
| * | | | | | | | | parisc: Add visible flag to toc_stack variableHelge Deller2022-01-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the visible flag to the toc_stack variable to make it visible for assembly code and to avoid a sparse warning. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Helge Deller <deller@gmx.de>
* | | | | | | | | | Merge tag 'riscv-for-linus-5.17-mw1' of ↵Linus Torvalds2022-01-2243-425/+1006
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for sv48 paging - Hart ID mappings are now sparse, which enables more CPUs to come up on systems with sparse hart IDs - A handful of cleanups and fixes * tag 'riscv-for-linus-5.17-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (27 commits) RISC-V: nommu_virt: Drop unused SLAB_MERGE_DEFAULT RISC-V: Remove redundant err variable riscv: dts: sifive unmatched: Add gpio poweroff riscv: canaan: remove useless select of non-existing config SYSCON RISC-V: Do not use cpumask data structure for hartid bitmap RISC-V: Move spinwait booting method to its own config RISC-V: Move the entire hart selection via lottery to SMP RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method RISC-V: Do not print the SBI version during HSM extension boot print RISC-V: Avoid using per cpu array for ordered booting riscv: default to CONFIG_RISCV_SBI_V01=n riscv: fix boolconv.cocci warnings riscv: Explicit comment about user virtual address space size riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo riscv: Implement sv48 support asm-generic: Prepare for riscv use of pud_alloc_one and pud_free riscv: Allow to dynamically define VA_BITS riscv: Introduce functions to switch pt_ops riscv: Split early kasan mapping to prepare sv48 introduction riscv: Move KASAN mapping next to the kernel mapping ...
| * | | | | | | | | | RISC-V: nommu_virt: Drop unused SLAB_MERGE_DEFAULTPalmer Dabbelt2022-01-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our nommu_virt_defconfig set SLOB=y and SLAB_MERGE_DEFAULT=n. As of eb52c0fc2331 ("mm: Make SLAB_MERGE_DEFAULT depend on SL[AU]B") it's no longer necessary to set the second, which appears to never have had any effect for SLOB=y anyway. This was suggested by savedefconfig. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | RISC-V: Remove redundant err variableMinghao Chi2022-01-201-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return value from user_regset_copyin() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | riscv: dts: sifive unmatched: Add gpio poweroffRon Economos2022-01-201-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the GPIO pins on the Unmatched are wire up to control the power of the board, indicate that in the device tree. Signed-off-by: Ron Economos <w6rz@comcast.net> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | riscv: canaan: remove useless select of non-existing config SYSCONLukas Bulwahn2022-01-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The config SYSCON never existed in the kernel repository; so, the select of that config in ./drivers/soc/canaan/Kconfig has no effect. Presumably, this was just some mistake, assuming some symmetry in handling and naming of configs that simply does not exist. Remove this useless select of a non-existing config. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | RISC-V: Do not use cpumask data structure for hartid bitmapAtish Patra2022-01-2011-136/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, SBI APIs accept a hartmask that is generated from struct cpumask. Cpumask data structure can hold upto NR_CPUs value. Thus, it is not the correct data structure for hartids as it can be higher than NR_CPUs for platforms with sparse or discontguous hartids. Remove all association between hartid mask and struct cpumask. Reviewed-by: Anup Patel <anup@brainfault.org> (For Linux RISC-V changes) Acked-by: Anup Patel <anup@brainfault.org> (For KVM RISC-V changes) Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | RISC-V: Move spinwait booting method to its own configAtish Patra2022-01-205-5/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spinwait booting method should only be used for platforms with older firmware without SBI HSM extension or M-mode firmware because spinwait method can't support cpu hotplug, kexec or sparse hartid. It is better to move the entire spinwait implementation to its own config which can be disabled if required. It is enabled by default to maintain backward compatibility and M-mode Linux. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | RISC-V: Move the entire hart selection via lottery to SMPAtish Patra2022-01-201-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The booting hart selection via lottery is only useful for SMP systems. Moreover, the lottery selection is only necessary for systems using spinwait booting method. It is better to keep the entire lottery selection together so that it can be disabled in future. Move the lottery selection code to under CONFIG_SMP. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | RISC-V: Use __cpu_up_stack/task_pointer only for spinwait methodAtish Patra2022-01-205-23/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __cpu_up_stack/task_pointer array is only used for spinwait method now. The per cpu array based lookup is also fragile for platforms with discontiguous/sparse hartids. The spinwait method is only used for M-mode Linux or older firmwares without SBI HSM extension. For general Linux systems, ordered booting method is preferred anyways to support cpu hotplug and kexec. Make sure that __cpu_up_stack/task_pointer is only used for spinwait method. Take this opportunity to rename it to __cpu_spinwait_stack/task_pointer to emphasize the purpose as well. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | RISC-V: Do not print the SBI version during HSM extension boot printAtish Patra2022-01-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The HSM extension information log also prints the SBI version v0.2. This is misleading as the underlying firmware SBI version may be different from v0.2. Remove the unncessary printing of SBI version. Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | RISC-V: Avoid using per cpu array for ordered bootingAtish Patra2022-01-204-15/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently both order booting and spinwait approach uses a per cpu array to update stack & task pointer. This approach will not work for the following cases. 1. If NR_CPUs are configured to be less than highest hart id. 2. A platform has sparse hartid. This issue can be fixed for ordered booting as the booting cpu brings up one cpu at a time using SBI HSM extension which has opaque parameter that is unused until now. Introduce a common secondary boot data structure that can store the stack and task pointer. Secondary harts will use this data while booting up to setup the sp & tp. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | riscv: default to CONFIG_RISCV_SBI_V01=nHeinrich Schuchardt2022-01-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SBI 0.1 specification is obsolete. The current version is 0.3. Hence we should not rely by default on SBI 0.1 being implemented. Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | riscv: fix boolconv.cocci warningskernel test robot2022-01-191-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch/riscv/mm/init.c:48:11-16: WARNING: conversion to bool not needed here Remove unneeded conversion to bool Semantic patch information: Relational and logical operators evaluate to bool, explicit conversion is overly verbose and unneeded. Generated by: scripts/coccinelle/misc/boolconv.cocci Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| * | | | | | | | | | RISC-V: Introduce sv48 support without relocatable kernelPalmer Dabbelt2022-01-1920-200/+731
| |\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patchset allows to have a single kernel for sv39 and sv48 without being relocatable. The idea comes from Arnd Bergmann who suggested to do the same as x86, that is mapping the kernel to the end of the address space, which allows the kernel to be linked at the same address for both sv39 and sv48 and then does not require to be relocated at runtime. This implements sv48 support at runtime. The kernel will try to boot with 4-level page table and will fallback to 3-level if the HW does not support it. Folding the 4th level into a 3-level page table has almost no cost at runtime. Note that kasan region had to be moved to the end of the address space since its location must be known at compile-time and then be valid for both sv39 and sv48 (and sv57 that is coming). * riscv-sv48-v3: riscv: Explicit comment about user virtual address space size riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo riscv: Implement sv48 support asm-generic: Prepare for riscv use of pud_alloc_one and pud_free riscv: Allow to dynamically define VA_BITS riscv: Introduce functions to switch pt_ops riscv: Split early kasan mapping to prepare sv48 introduction riscv: Move KASAN mapping next to the kernel mapping riscv: Get rid of MAXPHYSMEM configs Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
| | * | | | | | | | | | riscv: Explicit comment about user virtual address space sizeAlexandre Ghiti2022-01-191-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define precisely the size of the user accessible virtual space size for sv32/39/48 mmu types and explain why the whole virtual address space is split into 2 equal chunks between kernel and user space. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>