summaryrefslogtreecommitdiffstats
path: root/arch/x86/include
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | x86/hyperv: implement an MSI domain for root partitionWei Liu2021-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Linux runs as the root partition on Microsoft Hypervisor, its interrupts are remapped. Linux will need to explicitly map and unmap interrupts for hardware. Implement an MSI domain to issue the correct hypercalls. And initialize this irq domain as the default MSI irq domain. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-16-wei.liu@kernel.org
| * | | | | | | asm-generic/hyperv: import data structures for mapping device interruptsWei Liu2021-02-111-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-15-wei.liu@kernel.org
| * | | | | | | asm-generic/hyperv: update hv_msi_entryWei Liu2021-02-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will soon need to access fields inside the MSI address and MSI data fields. Introduce hv_msi_address_register and hv_msi_data_register. Fix up one user of hv_msi_entry in mshyperv.h. No functional change expected. Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-12-wei.liu@kernel.org
| * | | | | | | x86/hyperv: provide a bunch of helper functionsWei Liu2021-02-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They are used to deposit pages into Microsoft Hypervisor and bring up logical and virtual processors. Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-10-wei.liu@kernel.org
| * | | | | | | x86/hyperv: extract partition ID from Microsoft Hypervisor if necessaryWei Liu2021-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will need the partition ID for executing some hypercalls later. Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-7-wei.liu@kernel.org
| * | | | | | | x86/hyperv: allocate output arg pages if requiredWei Liu2021-02-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Linux runs as the root partition, it will need to make hypercalls which return data from the hypervisor. Allocate pages for storing results when Linux runs as the root partition. Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-6-wei.liu@kernel.org
| * | | | | | | x86/hyperv: detect if Linux is the root partitionWei Liu2021-02-112-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For now we can use the privilege flag to check. Stash the value to be used later. Put in a bunch of defines for future use when we want to have more fine-grained detection. Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-3-wei.liu@kernel.org
| * | | | | | | x86/hyperv: Load/save the Isolation Configuration leafAndrea Parri (Microsoft)2021-02-111-0/+15
| | |_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If bit 22 of Group B Features is set, the guest has access to the Isolation Configuration CPUID leaf. On x86, the first four bits of EAX in this leaf provide the isolation type of the partition; we entail three isolation types: 'SNP' (hardware-based isolation), 'VBS' (software-based isolation), and 'NONE' (no isolation). Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: x86@kernel.org Cc: linux-arch@vger.kernel.org Link: https://lore.kernel.org/r/20210201144814.2701-2-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
* | | | | | | Merge tag 'perf-core-2021-02-17' of ↵Linus Torvalds2021-02-212-13/+22
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance event updates from Ingo Molnar: - Add CPU-PMU support for Intel Sapphire Rapids CPUs - Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer two-parameter sampling event feedback. Not used yet, but is intended for Golden Cove CPU-PMU, which can provide both the instruction latency and the cache latency information for memory profiling events. - Remove experimental, default-disabled perfmon-v4 counter_freezing support that could only be enabled via a boot option. The hardware is hopelessly broken, we'd like to make sure nobody starts relying on this, as it would only end in tears. - Fix energy/power events on Intel SPR platforms - Simplify the uprobes resume_execution() logic - Misc smaller fixes. * tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/rapl: Fix psys-energy event on Intel SPR platform perf/x86/rapl: Only check lower 32bits for RAPL energy counters perf/x86/rapl: Add msr mask support perf/x86/kvm: Add Cascade Lake Xeon steppings to isolation_ucodes[] perf/x86/intel: Support CPUID 10.ECX to disable fixed counters perf/x86/intel: Add perf core PMU support for Sapphire Rapids perf/x86/intel: Filter unsupported Topdown metrics event perf/x86/intel: Factor out intel_update_topdown_event() perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT perf/intel: Remove Perfmon-v4 counter_freezing support x86/perf: Use static_call for x86_pmu.guest_get_msrs perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info perf/x86/intel/uncore: Store the logical die id instead of the physical die id. x86/kprobes: Do not decode opcode in resume_execution()
| * \ \ \ \ \ \ Merge branch 'perf/kprobes' into perf/core, to pick up finished branchIngo Molnar2021-02-171-4/+7
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | | | x86/kprobes: Do not decode opcode in resume_execution()Masami Hiramatsu2021-01-051-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kprobes decodes the opcode right after single-stepping in resume_execution(). But the opcode was already decoded while preparing arch_specific_insn in arch_copy_kprobe(). Decode the opcode in arch_copy_kprobe() instead of in resume_execution() and set some flags which classify the opcode for the resuming process. [ bp: Massage commit message. ] Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/160830072561.349576.3014979564448023213.stgit@devnote2
| * | | | | | | | perf/x86/intel: Add perf core PMU support for Sapphire RapidsKan Liang2021-02-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add perf core PMU support for the Intel Sapphire Rapids server, which is the successor of the Intel Ice Lake server. The enabling code is based on Ice Lake, but there are several new features introduced. The event encoding is changed and simplified, e.g., the event codes which are below 0x90 are restricted to counters 0-3. The event codes which above 0x90 are likely to have no restrictions. The event constraints, extra_regs(), and hardware cache events table are changed accordingly. A new Precise Distribution (PDist) facility is introduced, which further minimizes the skid when a precise event is programmed on the GP counter 0. Enable the Precise Distribution (PDist) facility with :ppp event. For this facility to work, the period must be initialized with a value larger than 127. Add spr_limit_period() to apply the limit for :ppp event. Two new data source fields, data block & address block, are added in the PEBS Memory Info Record for the load latency event. To enable the feature, - An auxiliary event has to be enabled together with the load latency event on Sapphire Rapids. A new flag PMU_FL_MEM_LOADS_AUX is introduced to indicate the case. A new event, mem-loads-aux, is exposed to sysfs for the user tool. Add a check in hw_config(). If the auxiliary event is not detected, return an unique error -ENODATA. - The union perf_mem_data_src is extended to support the new fields. - Ice Lake and earlier models do not support block information, but the fields may be set by HW on some machines. Add pebs_no_block to explicitly indicate the previous platforms which don't support the new block fields. Accessing the new block fields are ignored on those platforms. A new store Latency facility is introduced, which leverages the PEBS facility where it can provide additional information about sampled stores. The additional information includes the data address, memory auxiliary info (e.g. Data Source, STLB miss) and the latency of the store access. To enable the facility, the new event (0x02cd) has to be programed on the GP counter 0. A new flag PERF_X86_EVENT_PEBS_STLAT is introduced to indicate the event. The store_latency_data() is introduced to parse the memory auxiliary info. The layout of access latency field of PEBS Memory Info Record has been changed. Two latency, instruction latency (bit 15:0) and cache access latency (bit 47:32) are recorded. - The cache access latency is similar to previous memory access latency. For loads, the latency starts by the actual cache access until the data is returned by the memory subsystem. For stores, the latency starts when the demand write accesses the L1 data cache and lasts until the cacheline write is completed in the memory subsystem. The cache access latency is stored in low 32bits of the sample type PERF_SAMPLE_WEIGHT_STRUCT. - The instruction latency starts by the dispatch of the load operation for execution and lasts until completion of the instruction it belongs to. Add a new flag PMU_FL_INSTR_LATENCY to indicate the instruction latency support. The instruction latency is stored in the bit 47:32 of the sample type PERF_SAMPLE_WEIGHT_STRUCT. Extends the PERF_METRICS MSR to feature TMA method level 2 metrics. The lower half of the register is the TMA level 1 metrics (legacy). The upper half is also divided into four 8-bit fields for the new level 2 metrics. Expose all eight Topdown metrics events to user space. The full description for the SPR features can be found at Intel Architecture Instruction Set Extensions and Future Features Programming Reference, 319433-041. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1611873611-156687-5-git-send-email-kan.liang@linux.intel.com
| * | | | | | | | perf/x86/intel: Filter unsupported Topdown metrics eventKan Liang2021-02-011-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel Sapphire Rapids server will introduce 8 metrics events. Intel Ice Lake only supports 4 metrics events. A perf tool user may mistakenly use the unsupported events via RAW format on Ice Lake. The user can still get a value from the unsupported Topdown metrics event once the following Sapphire Rapids enabling patch is applied. To enable the 8 metrics events on Intel Sapphire Rapids, the INTEL_TD_METRIC_MAX has to be updated, which impacts the is_metric_event(). The is_metric_event() is a generic function. On Ice Lake, the newly added SPR metrics events will be mistakenly accepted as metric events on creation. At runtime, the unsupported Topdown metrics events will be updated. Add a variable num_topdown_events in x86_pmu to indicate the available number of the Topdown metrics event on the platform. Apply the number into is_metric_event(). Only the supported Topdown metrics events should be created as metrics events. Apply the num_topdown_events in icl_update_topdown_event() as well. The function can be reused by the following patch. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1611873611-156687-4-git-send-email-kan.liang@linux.intel.com
| * | | | | | | | x86/perf: Use static_call for x86_pmu.guest_get_msrsLike Xu2021-01-271-5/+1
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up that CONFIG_RETPOLINE crud and replace the indirect call x86_pmu.guest_get_msrs with static_call(). Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210125121458.181635-1-like.xu@linux.intel.com
* | | | | | | | Merge tag 'sched-core-2021-02-17' of ↵Linus Torvalds2021-02-212-9/+46
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove *USER_*PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups" * tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) sched,x86: Allow !PREEMPT_DYNAMIC entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point entry: Explicitly flush pending rcuog wakeup before last rescheduling point rcu/nocb: Trigger self-IPI on late deferred wake up before user resume rcu/nocb: Perform deferred wake up before last idle's need_resched() check rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers sched/features: Distinguish between NORMAL and DEADLINE hrtick sched/features: Fix hrtick reprogramming sched/deadline: Reduce rq lock contention in dl_add_task_root_domain() uprobes: (Re)add missing get_uprobe() in __find_uprobe() smp: Process pending softirqs in flush_smp_call_function_from_idle() sched: Harden PREEMPT_DYNAMIC static_call: Allow module use without exposing static_call_key sched: Add /debug/sched_preempt preempt/dynamic: Support dynamic preempt with preempt= boot option preempt/dynamic: Provide irqentry_exit_cond_resched() static call preempt/dynamic: Provide preempt_schedule[_notrace]() static calls preempt/dynamic: Provide cond_resched() and might_resched() static calls preempt: Introduce CONFIG_PREEMPT_DYNAMIC static_call: Provide DEFINE_STATIC_CALL_RET0() ...
| * | | | | | | | sched,x86: Allow !PREEMPT_DYNAMICPeter Zijlstra2021-02-171-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow building x86 with PREEMPT_DYNAMIC=n, this is needed for PREEMPT_RT as it makes no sense to not have full preemption on PREEMPT_RT. Fixes: 8c98e8cf723c ("preempt/dynamic: Provide preempt_schedule[_notrace]() static calls") Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Mike Galbraith <efault@gmx.de> Link: https://lkml.kernel.org/r/YCK1+JyFNxQnWeXK@hirez.programming.kicks-ass.net
| * | | | | | | | sched: Harden PREEMPT_DYNAMICPeter Zijlstra2021-02-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new EXPORT_STATIC_CALL_TRAMP() / static_call_mod() to unexport the static_call_key for the PREEMPT_DYNAMIC calls such that modules can no longer update these calls. Having modules change/hi-jack the preemption calls would be horrible. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | static_call: Allow module use without exposing static_call_keyJosh Poimboeuf2021-02-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module can use static_call_update() to change the function called. This is not desirable in general. Not exporting static_call_key however also disallows usage of static_call(), since objtool needs the key to construct the static_call_site. Solve this by allowing objtool to create the static_call_site using the trampoline address when it builds a module and cannot find the static_call_key symbol. The module loader will then try and map the trampole back to a key before it constructs the normal sites list. Doing this requires a trampoline -> key associsation, so add another magic section that keeps those. Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble
| * | | | | | | | preempt/dynamic: Provide preempt_schedule[_notrace]() static callsPeter Zijlstra (Intel)2021-02-171-8/+26
| | |_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide static calls to control preempt_schedule[_notrace]() (called in CONFIG_PREEMPT) so that we can override their behaviour when preempt= is overriden. Since the default behaviour is full preemption, both their calls are initialized to the arch provided wrapper, if any. [fweisbec: only define static calls when PREEMPT_DYNAMIC, make it less dependent on x86 with __preempt_schedule_func] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210118141223.123667-7-frederic@kernel.org
* | | | | | | | Merge tag 'core-mm-2021-02-17' of ↵Linus Torvalds2021-02-211-1/+0
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull tlb gather updates from Ingo Molnar: "Theses fix MM (soft-)dirty bit management in the procfs code & clean up the TLB gather API" * tag 'core-mm-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Use tlb_gather_mmu_fullmm() when freeing LDT page-tables tlb: arch: Remove empty __tlb_remove_tlb_entry() stubs tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() tlb: mmu_gather: Introduce tlb_gather_mmu_fullmm() tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu() mm: proc: Invalidate TLB after clearing soft-dirty page state
| * | | | | | | | tlb: arch: Remove empty __tlb_remove_tlb_entry() stubsWill Deacon2021-01-291-1/+0
| | |_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If __tlb_remove_tlb_entry() is not defined by the architecture then we provide an empty definition in asm-generic/tlb.h. Remove the redundant empty definitions for sparc64 and x86. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yu Zhao <yuzhao@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20210127235347.1402-6-will@kernel.org
* | | | | | | | Merge tag 'oprofile-removal-5.12' of ↵Linus Torvalds2021-02-211-1/+0
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux Pull oprofile and dcookies removal from Viresh Kumar: "Remove oprofile and dcookies support The 'oprofile' user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. The dcookies stuff is only used by the oprofile code. Now that oprofile's support is getting removed from the kernel, there is no need for dcookies as well. Remove kernel's old oprofile and dcookies support" * tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux: fs: Remove dcookies support drivers: Remove CONFIG_OPROFILE support arch: xtensa: Remove CONFIG_OPROFILE support arch: x86: Remove CONFIG_OPROFILE support arch: sparc: Remove CONFIG_OPROFILE support arch: sh: Remove CONFIG_OPROFILE support arch: s390: Remove CONFIG_OPROFILE support arch: powerpc: Remove oprofile arch: powerpc: Stop building and using oprofile arch: parisc: Remove CONFIG_OPROFILE support arch: mips: Remove CONFIG_OPROFILE support arch: microblaze: Remove CONFIG_OPROFILE support arch: ia64: Remove rest of perfmon support arch: ia64: Remove CONFIG_OPROFILE support arch: hexagon: Don't select HAVE_OPROFILE arch: arc: Remove CONFIG_OPROFILE support arch: arm: Remove CONFIG_OPROFILE support arch: alpha: Remove CONFIG_OPROFILE support
| * | | | | | | | arch: x86: Remove CONFIG_OPROFILE supportViresh Kumar2021-01-291-1/+0
| | |_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "oprofile" user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. Remove the old oprofile's architecture specific support. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Robert Richter <rric@kernel.org> Acked-by: William Cohen <wcohen@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | | | Merge branch 'work.elf-compat' of ↵Linus Torvalds2021-02-213-12/+32
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ELF compat updates from Al Viro: "Sanitizing ELF compat support, especially for triarch architectures: - X32 handling cleaned up - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now - Kconfig side of things regularized Eventually I hope to have compat_binfmt_elf.c killed, with both native and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed by kbuild, but that's a separate story - not included here" * 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of COMPAT_ELF_EXEC_PAGESIZE compat_binfmt_elf: don't bother with undef of ELF_ARCH Kconfig: regularize selection of CONFIG_BINFMT_ELF mips compat: switch to compat_binfmt_elf.c mips: don't bother with ELF_CORE_EFLAGS mips compat: don't bother with ELF_ET_DYN_BASE mips: KVM_GUEST makes no sense for 64bit builds... mips: kill unused definitions in binfmt_elf[on]32.c mips binfmt_elf*32.c: use elfcore-compat.h x32: make X32, !IA32_EMULATION setups able to execute x32 binaries [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly elf_prstatus: collect the common part (everything before pr_reg) into a struct binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
| * | | | | | | | x32: make X32, !IA32_EMULATION setups able to execute x32 binariesAl Viro2021-01-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's really trivial - the only wrinkle is making sure that compiler knows that ia32-related side of COMPAT_ARCH_DLINFO is dead code on such configs (we don't get there without having passed compat_elf_check_arch(), and on such configs that'll fail for ia32 binary). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properlyAl Viro2021-01-062-14/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To get rid of hardcoded size/offset in those macros we need to have a definition of i386 variant of struct elf_prstatus. However, we can't do that in asm/compat.h - the types needed for that are not there and adding an include of asm/user32.h into asm/compat.h would cause a lot of mess. That could be conveniently done in elfcore-compat.h, but currently there is nowhere to put arch-dependent parts of it - no asm/elfcore-compat.h. So we introduce a new file (asm/elfcore-compat.h, present on architectures that have CONFIG_ARCH_HAS_ELFCORE_COMPAT set, currently only on x86), have it pulled by linux/elfcore-compat.h and move the definitions there. As a side benefit, we don't need to worry about accidental inclusion of that file into binfmt_elf.c itself, so we don't need the dance with COMPAT_PRSTATUS_SIZE, etc. - only fs/compat_binfmt_elf.c will see that header. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALIDAl Viro2021-01-041-4/+7
| | |_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 64bit architectures that support 32bit processes there are two possible layouts for NT_PRSTATUS note in ELF coredumps. For one thing, several fields are 64bit for native processes and 32bit for compat ones (pr_sigpend, etc.). For another, the register dump is obviously different - the size and number of registers are not going to be the same for 32bit and 64bit variants of processor. Usually that's handled by having two structures - elf_prstatus for native layout and compat_elf_prstatus for 32bit one. 32bit processes are handled by fs/compat_binfmt_elf.c, which defines a macro called 'elf_prstatus' that expands to compat_elf_prstatus. Then it includes fs/binfmt_elf.c, which makes all references to struct elf_prstatus to be textually replaced with struct compat_elf_prstatus. Ugly and somewhat brittle, but it works. However, amd64 is worse - there are _three_ possible layouts. One for native 64bit processes, another for i386 (32bit) processes and yet another for x32 (32bit address space with full 64bit registers). Both i386 and x32 processes are handled by fs/compat_binfmt_elf.c, with usual compat_binfmt_elf.c trickery. However, the layouts for i386 and x32 are not identical - they have the common beginning, but the register dump part (pr_reg) is bigger on x32. Worse, pr_reg is not the last field - it's followed by int pr_fpvalid, so that field ends up at different offsets for i386 and x32 layouts. Fortunately, there's not much code that cares about any of that - it's all encapsulated in fill_thread_core_info(). Since x32 variant is bigger, we define compat_elf_prstatus to match that layout. That way i386 processes have enough space to fit their layout into. Moreover, since these layouts are identical prior to pr_reg, we don't need to distinguish x32 and i386 cases when we are setting the fields prior to pr_reg. Filling pr_reg itself is done by calling ->get() method of appropriate regset, and that method knows what layout (and size) to use. We do need to distinguish x32 and i386 cases only for two things: setting ->pr_fpvalid (offset differs for x32 and i386) and choosing the right size for our note. The way it's done is Not Nice, for the lack of more accurate printable description. There are two macros (PRSTATUS_SIZE and SET_PR_FPVALID), that default essentially to sizeof(struct elf_prstatus) and (S)->pr_fpvalid = 1. On x86 asm/compat.h provides its own variants. Unfortunately, quite a few things go wrong there: * PRSTATUS_SIZE doesn't use the normal test for process being an x32 one; it compares the size reported by regset with the size of pr_reg. * it hardcodes the sizes of x32 and i386 variants (296 and 144 resp.), so if some change in includes leads to asm/compat.h pulled in by fs/binfmt_elf.c we are in trouble - it will end up using the size of x32 variant for 64bit processes. * it's in the wrong place; asm/compat.h couldn't define the structure for i386 layout, since it lacks quite a few types needed for it. Hardcoded sizes are largely due to that. The proper fix would be to have an explicitly defined i386 variant of structure and have PRSTATUS_SIZE/SET_PR_FPVALID check for TIF_X32 to choose the variant that should be used. Unfortunately, that requires some manipulations of headers; we'll do that later in the series, but for now let's go with the minimal variant - rename PRSTATUS_SIZE in asm/compat.h to COMPAT_PRSTATUS_SIZE, have fs/compat_binfmt_elf.c define PRSTATUS_SIZE to COMPAT_PRSTATUS_SIZE and use the normal TIF_X32 check in that macro. The size of i386 variant is kept hardcoded for now. Similar story for SET_PR_FPVALID. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | Merge tag 'x86_asm_for_v5.12' of ↵Linus Torvalds2021-02-201-3/+3
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Borislav Petkov: "Annotate new MMIO-accessing insn wrappers' arguments with __iomem" * tag 'x86_asm_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Add a missing __iomem annotation in enqcmds() x86/asm: Annotate movdir64b()'s dst argument with __iomem
| * | | | | | | | x86/asm: Add a missing __iomem annotation in enqcmds()Dave Jiang2021-01-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a missing __iomem annotation to address a sparse warning. The caller is expected to pass an __iomem annotated pointer to this function. The current usages send a 64-bytes command descriptor to an MMIO location (portal) on a device for consumption. Also, from the comment in movdir64b(), which also applies to enqcmds(), @__dst must be supplied as an lvalue because this tells the compiler what the object is (its size) the instruction accesses. I.e., not the pointers but what they point to, thus the deref'ing '*'." The actual sparse warning is: drivers/dma/idxd/submit.c: note: in included file (through arch/x86/include/asm/processor.h, \ arch/x86/include/asm/timex.h, include/linux/timex.h, include/linux/time32.h, \ include/linux/time.h, include/linux/stat.h, ...): ./arch/x86/include/asm/special_insns.h:289:41: warning: incorrect type in initializer (different address spaces) ./arch/x86/include/asm/special_insns.h:289:41: expected struct <noident> *__dst ./arch/x86/include/asm/special_insns.h:289:41: got void [noderef] __iomem *dst [ bp: Massage commit message. ] Fixes: 7f5933f81bd8 ("x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lkml.kernel.org/r/161003789741.4062451.14362269365703761223.stgit@djiang5-desk3.ch.intel.com
| * | | | | | | | x86/asm: Annotate movdir64b()'s dst argument with __iomemDave Jiang2021-01-081-2/+2
| | |_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a missing __iomem annotation to address a sparse warning. The caller is expected to pass an __iomem annotated pointer to this function. The current usages send a 64-bytes command descriptor to an MMIO location (portal) on a device for consumption. When future usages for the MOVDIR64B instruction warrant a separate variant of a memory to memory operation, the argument annotation can be revisited. Also, from the comment in movdir64b() @__dst must be supplied as an lvalue because this tells the compiler what the object is (its size) the instruction accesses. I.e., not the pointers but what they point to, thus the deref'ing '*'." The actual sparse warning is: sparse warnings: (new ones prefixed by >>) drivers/dma/idxd/submit.c: note: in included file (through include/linux/io.h, include/linux/pci.h): >> arch/x86/include/asm/io.h:422:27: sparse: sparse: incorrect type in \ argument 1 (different address spaces) @@ expected void *dst @@ got void [noderef] __iomem *dst @@ arch/x86/include/asm/io.h:422:27: sparse: expected void *dst arch/x86/include/asm/io.h:422:27: sparse: got void [noderef] __iomem *dst [ bp: Massage commit message. ] Fixes: 0888e1030d3e ("x86/asm: Carve out a generic movdir64b() helper for general usage") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lkml.kernel.org/r/161003787823.4062451.6564503265464317197.stgit@djiang5-desk3.ch.intel.com
* | | | | | | | Merge tag 'x86_cache_for_v5.12' of ↵Linus Torvalds2021-02-201-4/+7
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Avoid IPI-ing a task in certain cases and prevent load/store tearing when accessing a task's resctrl fields concurrently" * tag 'x86_cache_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Apply READ_ONCE/WRITE_ONCE to task_struct.{rmid,closid} x86/resctrl: Use task_curr() instead of task_struct->on_cpu to prevent unnecessary IPI x86/resctrl: Add printf attribute to log function
| * | | | | | | | x86/resctrl: Apply READ_ONCE/WRITE_ONCE to task_struct.{rmid,closid}Valentin Schneider2021-01-111-4/+7
| | |_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A CPU's current task can have its {closid, rmid} fields read locally while they are being concurrently written to from another CPU. This can happen anytime __resctrl_sched_in() races with either __rdtgroup_move_task() or rdt_move_group_tasks(). Prevent load / store tearing for those accesses by giving them the READ_ONCE() / WRITE_ONCE() treatment. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/9921fda88ad81afb9885b517fbe864a2bc7c35a9.1608243147.git.reinette.chatre@intel.com
* | | | | | | | Merge tag 'x86_cpu_for_v5.12' of ↵Linus Torvalds2021-02-204-10/+20
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPUID cleanup from Borislav Petkov: "Assign a dedicated feature word to a CPUID leaf which is widely used" * tag 'x86_cpu_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpufeatures: Assign dedicated feature word for CPUID_0x8000001F[EAX]
| * | | | | | | | x86/cpufeatures: Assign dedicated feature word for CPUID_0x8000001F[EAX]Sean Christopherson2021-01-284-10/+20
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Collect the scattered SME/SEV related feature flags into a dedicated word. There are now five recognized features in CPUID.0x8000001F.EAX, with at least one more on the horizon (SEV-SNP). Using a dedicated word allows KVM to use its automagic CPUID adjustment logic when reporting the set of supported features to userspace. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Link: https://lkml.kernel.org/r/20210122204047.2860075-2-seanjc@google.com
* | | | | | | | Merge tag 'x86_fpu_for_v5.12' of ↵Linus Torvalds2021-02-202-4/+32
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Borislav Petkov: "x86 fpu usage optimization and cleanups: - make 64-bit kernel code which uses 387 insns request a x87 init (FNINIT) explicitly when using the FPU - misc cleanups" * tag 'x86_fpu_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xstate: Use sizeof() instead of a constant x86/fpu/64: Don't FNINIT in kernel_fpu_begin() x86/fpu: Make the EFI FPU calling convention explicit
| * | | | | | | | x86/fpu/64: Don't FNINIT in kernel_fpu_begin()Andy Lutomirski2021-01-291-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The remaining callers of kernel_fpu_begin() in 64-bit kernels don't use 387 instructions, so there's no need to sanitize the FPU state. Skip it to get most of the performance we lost back. Reported-by: Krzysztof Olędzki <ole@ans.pl> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/57f8841ccbf9f3c25a23196c888f5f6ec5887577.1611205691.git.luto@kernel.org
| * | | | | | | | x86/fpu: Make the EFI FPU calling convention explicitAndy Lutomirski2021-01-291-4/+20
| | |_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EFI uses kernel_fpu_begin() to conform to the UEFI calling convention. This specifically requires initializing FCW (FPU Control Word), whereas no sane 64-bit kernel code should use legacy 387 operations that reference FCW. This should allow to safely change the default semantics of kernel_fpu_begin() to stop initializing FCW on 64-bit kernels. [ bp: Massage commit message a little. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/25d392fff64680e0f4bb8cf0b1003314dc29eafe.1611205691.git.luto@kernel.org
* | | | | | | | Merge tag 'x86_microcode_for_v5.12' of ↵Linus Torvalds2021-02-201-2/+0
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode cleanup from Borislav Petkov: "Make the driver init function static again" * tag 'x86_microcode_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Make microcode_init() static
| * | | | | | | | x86/microcode: Make microcode_init() staticBorislav Petkov2020-12-311-2/+0
| | |_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201230122147.26938-1-bp@alien8.de
* | | | | | | | Merge tag 'x86_mm_for_v5.12' of ↵Linus Torvalds2021-02-206-9/+4
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm cleanups from Borislav Petkov: - PTRACE_GETREGS/PTRACE_PUTREGS regset selection cleanup - Another initial cleanup - more to follow - to the fault handling code. - Other minor cleanups and corrections. * tag 'x86_mm_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/{fault,efi}: Fix and rename efi_recover_from_page_fault() x86/fault: Don't run fixups for SMAP violations x86/fault: Don't look for extable entries for SMEP violations x86/fault: Rename no_context() to kernelmode_fixup_or_oops() x86/fault: Bypass no_context() for implicit kernel faults from usermode x86/fault: Split the OOPS code out from no_context() x86/fault: Improve kernel-executing-user-memory handling x86/fault: Correct a few user vs kernel checks wrt WRUSS x86/fault: Document the locking in the fault_signal_pending() path x86/fault/32: Move is_f00f_bug() to do_kern_addr_fault() x86/fault: Fold mm_fault_error() into do_user_addr_fault() x86/fault: Skip the AMD erratum #91 workaround on unaffected CPUs x86/fault: Fix AMD erratum #91 errata fixup for user code x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTC x86/asm: Fixup TASK_SIZE_MAX comment x86/ptrace: Clean up PTRACE_GETREGS/PTRACE_PUTREGS regset selection x86/vm86/32: Remove VM86_SCREEN_BITMAP support x86: Remove definition of DEBUG x86/entry: Remove now unused do_IRQ() declaration x86/mm: Remove duplicate definition of _PAGE_PAT_LARGE ...
| * \ \ \ \ \ \ \ Merge branch 'x86/cleanups' into x86/mmIngo Molnar2021-02-125-8/+3
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge recent cleanups to the x86 MM code to resolve a conflict. Conflicts: arch/x86/mm/fault.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | | | | x86/asm: Fixup TASK_SIZE_MAX commentAlexey Dobriyan2021-02-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Comment says "by preventing anything executable" which is not true. Even PROT_NONE mapping can't be installed at (1<<47 - 4096). mmap(0x7ffffffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM [ bp: Fixup to the moved location in page_64_types.h. ] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200305181719.GA5490@avx2
| | * | | | | | | | x86/vm86/32: Remove VM86_SCREEN_BITMAP supportAndy Lutomirski2021-01-212-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation was rather buggy. It unconditionally marked PTEs read-only, even for VM_SHARED mappings. I'm not sure whether this is actually a problem, but it certainly seems unwise. More importantly, it released the mmap lock before flushing the TLB, which could allow a racing CoW operation to falsely believe that the underlying memory was not writable. I can't find any users at all of this mechanism, so just remove it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Stas Sergeev <stsp2@yandex.ru> Link: https://lkml.kernel.org/r/f3086de0babcab36f69949b5780bde851f719bc8.1611078018.git.luto@kernel.org
| | * | | | | | | | x86/entry: Remove now unused do_IRQ() declarationHao Lee2021-01-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_IRQ() has been replaced by common_interrupt() in fa5e5c409213 ("x86/entry: Use idtentry for interrupts") Remove its now unused declaration. Signed-off-by: Hao Lee <haolee.swjtu@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210103030834.GA15432@haolee.github.io
| | * | | | | | | | x86/mm: Remove duplicate definition of _PAGE_PAT_LARGEArvind Sankar2021-01-081-2/+0
| | |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _PAGE_PAT_LARGE is already defined next to _PAGE_PAT. Remove the duplicate. Fixes: 4efb56649132 ("x86/mm: Tabulate the page table encoding definitions") Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20201111160946.147341-2-nivedita@alum.mit.edu
| * | | | / / / / x86/{fault,efi}: Fix and rename efi_recover_from_page_fault()Andy Lutomirski2021-02-101-1/+1
| | |_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | efi_recover_from_page_fault() doesn't recover -- it does a special EFI mini-oops. Rename it to make it clear that it crashes. While renaming it, I noticed a blatant bug: a page fault oops in a different thread happening concurrently with an EFI runtime service call would be misinterpreted as an EFI page fault. Fix that. This isn't quite exact. The situation could be improved by using a special CS for calls into EFI. [ bp: Massage commit message and simplify in interrupt check. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/f43b1e80830dc78ed60ed8b0826f4f189254570c.1612924255.git.luto@kernel.org
* | | | | | | | Merge tag 'x86_paravirt_for_v5.12' of ↵Linus Torvalds2021-02-204-77/+22
|\ \ \ \ \ \ \ \ | | |_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Borislav Petkov: "Part one of a major conversion of the paravirt infrastructure to our kernel patching facilities and getting rid of the custom-grown ones" * tag 'x86_paravirt_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pv: Rework arch_local_irq_restore() to not use popf x86/xen: Drop USERGS_SYSRET64 paravirt call x86/pv: Switch SWAPGS to ALTERNATIVE x86/xen: Use specific Xen pv interrupt entry for DF x86/xen: Use specific Xen pv interrupt entry for MCE
| * | | | | | | x86/pv: Rework arch_local_irq_restore() to not use popfJuergen Gross2021-02-103-24/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POPF is a rather expensive operation, so don't use it for restoring irq flags. Instead, test whether interrupts are enabled in the flags parameter and enable interrupts via STI in that case. This results in the restore_fl paravirt op to be no longer needed. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210120135555.32594-7-jgross@suse.com
| * | | | | | | x86/xen: Drop USERGS_SYSRET64 paravirt callJuergen Gross2021-02-103-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | USERGS_SYSRET64 is used to return from a syscall via SYSRET, but a Xen PV guest will nevertheless use the IRET hypercall, as there is no sysret PV hypercall defined. So instead of testing all the prerequisites for doing a sysret and then mangling the stack for Xen PV again for doing an iret just use the iret exit from the beginning. This can easily be done via an ALTERNATIVE like it is done for the sysenter compat case already. It should be noted that this drops the optimization in Xen for not restoring a few registers when returning to user mode, but it seems as if the saved instructions in the kernel more than compensate for this drop (a kernel build in a Xen PV guest was slightly faster with this patch applied). While at it remove the stale sysret32 remnants. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210120135555.32594-6-jgross@suse.com
| * | | | | | | x86/pv: Switch SWAPGS to ALTERNATIVEJuergen Gross2021-02-103-34/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SWAPGS is used only for interrupts coming from user mode or for returning to user mode. So there is no reason to use the PARAVIRT framework, as it can easily be replaced by an ALTERNATIVE depending on X86_FEATURE_XENPV. There are several instances using the PV-aware SWAPGS macro in paths which are never executed in a Xen PV guest. Replace those with the plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210120135555.32594-5-jgross@suse.com