From 2c18bd525c47f882f033b0a813ecd09c93e1ecdf Mon Sep 17 00:00:00 2001 From: Babu Moger Date: Thu, 4 Jun 2020 14:45:16 -0500 Subject: x86/resctrl: Fix memory bandwidth counter width for AMD MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Memory bandwidth is calculated reading the monitoring counter at two intervals and calculating the delta. It is the software’s responsibility to read the count often enough to avoid having the count roll over _twice_ between reads. The current code hardcodes the bandwidth monitoring counter's width to 24 bits for AMD. This is due to default base counter width which is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX to adjust the counter width. But, the AMD hardware supports much wider bandwidth counter with the default width of 44 bits. Kernel reads these monitoring counters every 1 second and adjusts the counter value for overflow. With 24 bits and scale value of 64 for AMD, it can only measure up to 1GB/s without overflowing. For the rates above 1GB/s this will fail to measure the bandwidth. Fix the issue setting the default width to 44 bits by adjusting the offset. AMD future products will implement CPUID 0xF.[ECX=1]:EAX. [ bp: Let the line stick out and drop {}-brackets around a single statement. ] Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/159129975546.62538.5656031125604254041.stgit@naples-babu.amd.com --- arch/x86/kernel/cpu/resctrl/core.c | 8 ++++---- arch/x86/kernel/cpu/resctrl/internal.h | 1 + 2 files changed, 5 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c index 12f967c6b603..6a9df71c1b9e 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -981,10 +981,10 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c) c->x86_cache_max_rmid = ecx; c->x86_cache_occ_scale = ebx; - if (c->x86_vendor == X86_VENDOR_INTEL) - c->x86_cache_mbm_width_offset = eax & 0xff; - else - c->x86_cache_mbm_width_offset = -1; + c->x86_cache_mbm_width_offset = eax & 0xff; + + if (c->x86_vendor == X86_VENDOR_AMD && !c->x86_cache_mbm_width_offset) + c->x86_cache_mbm_width_offset = MBM_CNTR_WIDTH_OFFSET_AMD; } } diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h index f20a47d120b1..5ffa32256b3b 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -37,6 +37,7 @@ #define MBA_IS_LINEAR 0x4 #define MBA_MAX_MBPS U32_MAX #define MAX_MBA_BW_AMD 0x800 +#define MBM_CNTR_WIDTH_OFFSET_AMD 20 #define RMID_VAL_ERROR BIT_ULL(63) #define RMID_VAL_UNAVAIL BIT_ULL(62) -- cgit v1.2.3 From 5d5103595e9e53048bb7e70ee2673c897ab38300 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Mon, 8 Jun 2020 10:41:34 -0700 Subject: x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup Reinitialize IA32_FEAT_CTL on the BSP during wakeup to handle the case where firmware doesn't initialize or save/restore across S3. This fixes a bug where IA32_FEAT_CTL is left uninitialized and results in VMXON taking a #GP due to VMX not being fully enabled, i.e. breaks KVM. Use init_ia32_feat_ctl() to "restore" IA32_FEAT_CTL as it already deals with the case where the MSR is locked, and because APs already redo init_ia32_feat_ctl() during suspend by virtue of the SMP boot flow being used to reinitialize APs upon wakeup. Do the call in the early wakeup flow to avoid dependencies in the syscore_ops chain, e.g. simply adding a resume hook is not guaranteed to work, as KVM does VMXON in its own resume hook, kvm_resume(), when KVM has active guests. Fixes: 21bd3467a58e ("KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR") Reported-by: Brad Campbell Signed-off-by: Sean Christopherson Signed-off-by: Borislav Petkov Reviewed-by: Liam Merwick Reviewed-by: Maxim Levitsky Tested-by: Brad Campbell Cc: stable@vger.kernel.org # v5.6 Link: https://lkml.kernel.org/r/20200608174134.11157-1-sean.j.christopherson@intel.com --- arch/x86/include/asm/cpu.h | 5 +++++ arch/x86/kernel/cpu/centaur.c | 1 + arch/x86/kernel/cpu/cpu.h | 4 ---- arch/x86/kernel/cpu/zhaoxin.c | 1 + arch/x86/power/cpu.c | 6 ++++++ 5 files changed, 13 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index dd17c2da1af5..da78ccbd493b 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -58,4 +58,9 @@ static inline bool handle_guest_split_lock(unsigned long ip) return false; } #endif +#ifdef CONFIG_IA32_FEAT_CTL +void init_ia32_feat_ctl(struct cpuinfo_x86 *c); +#else +static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {} +#endif #endif /* _ASM_X86_CPU_H */ diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c index 426792565d86..c5cf336e5077 100644 --- a/arch/x86/kernel/cpu/centaur.c +++ b/arch/x86/kernel/cpu/centaur.c @@ -3,6 +3,7 @@ #include #include +#include #include #include #include diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index fb538fccd24c..9d033693519a 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -81,8 +81,4 @@ extern void update_srbds_msr(void); extern u64 x86_read_arch_cap_msr(void); -#ifdef CONFIG_IA32_FEAT_CTL -void init_ia32_feat_ctl(struct cpuinfo_x86 *c); -#endif - #endif /* ARCH_X86_CPU_H */ diff --git a/arch/x86/kernel/cpu/zhaoxin.c b/arch/x86/kernel/cpu/zhaoxin.c index df1358ba622b..05fa4ef63490 100644 --- a/arch/x86/kernel/cpu/zhaoxin.c +++ b/arch/x86/kernel/cpu/zhaoxin.c @@ -2,6 +2,7 @@ #include #include +#include #include #include "cpu.h" diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 7c65102debaf..db1378c6ff26 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -193,6 +193,8 @@ static void fix_processor_context(void) */ static void notrace __restore_processor_state(struct saved_context *ctxt) { + struct cpuinfo_x86 *c; + if (ctxt->misc_enable_saved) wrmsrl(MSR_IA32_MISC_ENABLE, ctxt->misc_enable); /* @@ -263,6 +265,10 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) mtrr_bp_restore(); perf_restore_debug_store(); msr_restore_context(ctxt); + + c = &cpu_data(smp_processor_id()); + if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL)) + init_ia32_feat_ctl(c); } /* Needed by apm.c */ -- cgit v1.2.3 From cc5277fe66cf3ad68f41f1c539b2ef0d5e432974 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 2 Jun 2020 22:36:11 +0300 Subject: x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get() The callers don't expect *d_cdp to be set to an error pointer, they only check for NULL. This leads to a static checker warning: arch/x86/kernel/cpu/resctrl/rdtgroup.c:2648 __init_one_rdt_domain() warn: 'd_cdp' could be an error pointer This would not trigger a bug in this specific case because __init_one_rdt_domain() calls it with a valid domain that would not have a negative id and thus not trigger the return of the ERR_PTR(). If this was a negative domain id then the call to rdt_find_domain() in domain_add_cpu() would have returned the ERR_PTR() much earlier and the creation of the domain with an invalid id would have been prevented. Even though a bug is not triggered currently the right and safe thing to do is to set the pointer to NULL because that is what can be checked for when the caller is handling the CDP and non-CDP cases. Fixes: 52eb74339a62 ("x86/resctrl: Fix rdt_find_domain() return value and checks") Signed-off-by: Dan Carpenter Signed-off-by: Borislav Petkov Acked-by: Reinette Chatre Acked-by: Fenghua Yu Link: https://lkml.kernel.org/r/20200602193611.GA190851@mwanda --- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 23b4b61319d3..3f844f14fc0a 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1117,6 +1117,7 @@ static int rdt_cdp_peer_get(struct rdt_resource *r, struct rdt_domain *d, _d_cdp = rdt_find_domain(_r_cdp, d->id, NULL); if (WARN_ON(IS_ERR_OR_NULL(_d_cdp))) { _r_cdp = NULL; + _d_cdp = NULL; ret = -EINVAL; } -- cgit v1.2.3 From a13b9d0b97211579ea63b96c606de79b963c0f47 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 8 Jun 2020 20:15:09 -0700 Subject: x86/cpu: Use pinning mask for CR4 bits needing to be 0 The X86_CR4_FSGSBASE bit of CR4 should not change after boot[1]. Older kernels should enforce this bit to zero, and newer kernels need to enforce it depending on boot-time configuration (e.g. "nofsgsbase"). To support a pinned bit being either 1 or 0, use an explicit mask in combination with the expected pinned bit values. [1] https://lore.kernel.org/lkml/20200527103147.GI325280@hirez.programming.kicks-ass.net Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/202006082013.71E29A42@keescook --- arch/x86/kernel/cpu/common.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 043d93cdcaad..95c090a45b4b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -347,6 +347,9 @@ out: cr4_clear_bits(X86_CR4_UMIP); } +/* These bits should not change their value after CPU init is finished. */ +static const unsigned long cr4_pinned_mask = + X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE; static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning); static unsigned long cr4_pinned_bits __ro_after_init; @@ -371,20 +374,20 @@ EXPORT_SYMBOL(native_write_cr0); void native_write_cr4(unsigned long val) { - unsigned long bits_missing = 0; + unsigned long bits_changed = 0; set_register: asm volatile("mov %0,%%cr4": "+r" (val), "+m" (cr4_pinned_bits)); if (static_branch_likely(&cr_pinning)) { - if (unlikely((val & cr4_pinned_bits) != cr4_pinned_bits)) { - bits_missing = ~val & cr4_pinned_bits; - val |= bits_missing; + if (unlikely((val & cr4_pinned_mask) != cr4_pinned_bits)) { + bits_changed = (val & cr4_pinned_mask) ^ cr4_pinned_bits; + val = (val & ~cr4_pinned_mask) | cr4_pinned_bits; goto set_register; } - /* Warn after we've set the missing bits. */ - WARN_ONCE(bits_missing, "CR4 bits went missing: %lx!?\n", - bits_missing); + /* Warn after we've corrected the changed bits. */ + WARN_ONCE(bits_changed, "pinned CR4 bits changed: 0x%lx!?\n", + bits_changed); } } #if IS_MODULE(CONFIG_LKDTM) @@ -419,7 +422,7 @@ void cr4_init(void) if (boot_cpu_has(X86_FEATURE_PCID)) cr4 |= X86_CR4_PCIDE; if (static_branch_likely(&cr_pinning)) - cr4 |= cr4_pinned_bits; + cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits; __write_cr4(cr4); @@ -434,10 +437,7 @@ void cr4_init(void) */ static void __init setup_cr_pinning(void) { - unsigned long mask; - - mask = (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP); - cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & mask; + cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask; static_key_enable(&cr_pinning.key); } -- cgit v1.2.3 From bb5570ad3b54e7930997aec76ab68256d5236d94 Mon Sep 17 00:00:00 2001 From: Matt Fleming Date: Thu, 18 Jun 2020 11:20:02 +0100 Subject: x86/asm/64: Align start of __clear_user() loop to 16-bytes x86 CPUs can suffer severe performance drops if a tight loop, such as the ones in __clear_user(), straddles a 16-byte instruction fetch window, or worse, a 64-byte cacheline. This issues was discovered in the SUSE kernel with the following commit, 1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants") which increased the code object size from 10 bytes to 15 bytes and caused the 8-byte copy loop in __clear_user() to be split across a 64-byte cacheline. Aligning the start of the loop to 16-bytes makes this fit neatly inside a single instruction fetch window again and restores the performance of __clear_user() which is used heavily when reading from /dev/zero. Here are some numbers from running libmicro's read_z* and pread_z* microbenchmarks which read from /dev/zero: Zen 1 (Naples) libmicro-file 5.7.0-rc6 5.7.0-rc6 5.7.0-rc6 revert-1153933703d9+ align16+ Time mean95-pread_z100k 9.9195 ( 0.00%) 5.9856 ( 39.66%) 5.9938 ( 39.58%) Time mean95-pread_z10k 1.1378 ( 0.00%) 0.7450 ( 34.52%) 0.7467 ( 34.38%) Time mean95-pread_z1k 0.2623 ( 0.00%) 0.2251 ( 14.18%) 0.2252 ( 14.15%) Time mean95-pread_zw100k 9.9974 ( 0.00%) 6.0648 ( 39.34%) 6.0756 ( 39.23%) Time mean95-read_z100k 9.8940 ( 0.00%) 5.9885 ( 39.47%) 5.9994 ( 39.36%) Time mean95-read_z10k 1.1394 ( 0.00%) 0.7483 ( 34.33%) 0.7482 ( 34.33%) Note that this doesn't affect Haswell or Broadwell microarchitectures which seem to avoid the alignment issue by executing the loop straight out of the Loop Stream Detector (verified using perf events). Fixes: 1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants") Signed-off-by: Matt Fleming Signed-off-by: Borislav Petkov Cc: # v4.19+ Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk --- arch/x86/lib/usercopy_64.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index fff28c6f73a2..b0dfac3d3df7 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -24,6 +24,7 @@ unsigned long __clear_user(void __user *addr, unsigned long size) asm volatile( " testq %[size8],%[size8]\n" " jz 4f\n" + " .align 16\n" "0: movq $0,(%[dst])\n" " addq $8,%[dst]\n" " decl %%ecx ; jnz 0b\n" -- cgit v1.2.3