summaryrefslogtreecommitdiffstats
path: root/drivers/edac
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'akpm' (patches from Andrew)Linus Torvalds2020-12-151-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc updates from Andrew Morton: - a few random little subsystems - almost all of the MM patches which are staged ahead of linux-next material. I'll trickle to post-linux-next work in as the dependents get merged up. Subsystems affected by this patch series: kthread, kbuild, ide, ntfs, ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc, uaccess, zram, and cleanups). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits) mm: cleanup kstrto*() usage mm: fix fall-through warnings for Clang mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at mm: shmem: convert shmem_enabled_show to use sysfs_emit_at mm:backing-dev: use sysfs_emit in macro defining functions mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening mm: use sysfs_emit for struct kobject * uses mm: fix kernel-doc markups zram: break the strict dependency from lzo zram: add stat to gather incompressible pages since zram set up zram: support page writeback mm/process_vm_access: remove redundant initialization of iov_r mm/zsmalloc.c: rework the list_add code in insert_zspage() mm/zswap: move to use crypto_acomp API for hardware acceleration mm/zswap: fix passing zero to 'PTR_ERR' warning mm/zswap: make struct kernel_param_ops definitions const userfaultfd/selftests: hint the test runner on required privilege userfaultfd/selftests: fix retval check for userfaultfd_open() userfaultfd/selftests: always dump something in modes userfaultfd: selftests: make __{s,u}64 format specifiers portable ...
| * edac: ghes: use krealloc_array()Bartosz Golaszewski2020-12-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/20201109110654.12547-7-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Knig <christian.koenig@amd.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: David Rientjes <rientjes@google.com> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: James Morse <james.morse@arm.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'x86_cpu_for_v5.11' of ↵Linus Torvalds2020-12-142-4/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: "Only AMD-specific changes this time: - Save the AMD physical die ID into cpuinfo_x86.cpu_die_id and convert all code to use it (Yazen Ghannam) - Remove a dead and unused TSEG region remapping workaround on AMD (Arvind Sankar)" * tag 'x86_cpu_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Remove dead code for TSEG region remapping x86/topology: Set cpu_die_id only if DIE_TYPE found EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId x86/CPU/AMD: Remove amd_get_nb_id() x86/CPU/AMD: Save AMD NodeId as cpu_die_id
| * | EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeIdYazen Ghannam2020-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and later systems. This function is used in amd64_edac_mod to do system-specific decoding for DRAM ECC errors. The function takes a "NodeId" as a parameter. In AMD documentation, NodeId is used to identify a physical die in a system. This can be used to identify a node in the AMD_NB code and also it is used with umc_normaddr_to_sysaddr(). However, the input used for decode_dram_ecc() is currently the NUMA node of a logical CPU. In the default configuration, the NUMA node and physical die will be equivalent, so this doesn't have an impact. But the NUMA node configuration can be adjusted with optional memory interleaving modes. This will cause the NUMA node enumeration to not match the physical die enumeration. The mismatch will cause the address translation function to fail or report incorrect results. Use struct cpuinfo_x86.cpu_die_id for the node_id parameter to ensure the physical ID is used. Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-4-Yazen.Ghannam@amd.com
| * | x86/CPU/AMD: Remove amd_get_nb_id()Yazen Ghannam2020-11-192-3/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | The Last Level Cache ID is returned by amd_get_nb_id(). In practice, this value is the same as the AMD NodeId for callers of this function. The NodeId is saved in struct cpuinfo_x86.cpu_die_id. Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and remove the function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-3-Yazen.Ghannam@amd.com
| |
| \
| \
| \
*---. \ Merge branches 'edac-spr', 'edac-igen6' and 'edac-misc' into ↵Borislav Petkov2020-12-1425-63/+1076
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | edac-updates-for-v5.11 Signed-off-by: Borislav Petkov <bp@suse.de>
| | | * | EDAC/amd64: Fix PCI component registrationBorislav Petkov2020-11-271-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to setup its PCI component, the driver needs any node private instance in order to get a reference to the PCI device and hand that into edac_pci_create_generic_ctl(). For convenience, it uses the 0th memory controller descriptor under the assumption that if any, the 0th will be always present. However, this assumption goes wrong when the 0th node doesn't have memory and the driver doesn't initialize an instance for it: EDAC amd64: F17h detected (node 0). ... EDAC amd64: Node 0: No DIMMs detected. But looking up node instances is not really needed - all one needs is the pointer to the proper device which gets discovered during instance init. So stash that pointer into a variable and use it when setting up the EDAC PCI component. Clear that variable when the driver needs to unwind due to some instances failing init to avoid any registration imbalance. Cc: <stable@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201122150815.13808-1-bp@alien8.de
| | | * | EDAC/synopsys: Return the correct value in mc_probe()Zhang Xiaoxu2020-11-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return the error value if the inject sysfs file creation fails, rather than returning 0, to signal to the upper layer that the ->probe function failed. [ bp: Massage. ] Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20201116135810.3130845-1-zhangxiaoxu5@huawei.com
| | | * | EDAC: Fix some kernel-doc markupsMauro Carvalho Chehab2020-11-021-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel-doc markup should use this format: identifier - description Correct that and also fix some enums' names in the kernel-doc markup. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1d291393ba58c7b80908a3fedf02d2f53921ffe9.1603469755.git.mchehab+huawei@kernel.org
| | | * | EDAC: Do not issue useless debug statements in the polling routineBorislav Petkov2020-10-2614-14/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They have been spreading around the subsystem by example so remove them all. Reported-by: Raymond Bennett <raymond.bennett@gmail.com> Suggested-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| | | * | EDAC/amd64: Remove unneeded breaksTom Rix2020-10-261-8/+0
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A break is not needed if it is preceded by a return. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Robert Richter <rric@kernel.org> Link: https://lkml.kernel.org/r/20201019193524.13391-1-trix@redhat.com
| | * | EDAC/igen6: ecclog_llist can be statickernel test robot2020-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20201123031850.GA20416@aef56166e5fc Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | | EDAC/i10nm: Add Intel Sapphire Rapids server supportQiuxu Zhuo2020-11-194-21/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Sapphire Rapids CPU model shares the same memory controller architecture with Ice Lake server. There are some configurations different from Ice Lake server as below: - The device ID for configuration agent. - The size for per channel memory-mapped I/O. - The DDR5 memory support. So add the above configurations and the Sapphire Rapids CPU model ID for EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | | EDAC: Add DDR5 new memory typeQiuxu Zhuo2020-11-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for DDR5 new memory type. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | | EDAC/i10nm: Use readl() to access MMIO registersQiuxu Zhuo2020-11-191-4/+7
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of raw access, use readl() to access MMIO registers of memory controller to avoid possible compiler re-ordering. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Cc: <stable@vger.kernel.org> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | EDAC/igen6: Add debugfs interface for Intel client SoC EDAC driverQiuxu Zhuo2020-11-191-0/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add debugfs support to fake memory correctable errors to test the error reporting path and the error address decoding logic in the igen6_edac driver. Please note that the fake errors are also reported to EDAC core and then the CE counter in EDAC sysfs is also increased. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECCQiuxu Zhuo2020-11-193-0/+928
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This driver supports Intel client SoC with integrated memory controller using In-Band ECC(IBECC). The memory correctable and uncorrectable errors are reported via NMIs. The driver handles the NMIs and decodes the memory error address to platform specific address. The first IBECC-supported SoC is Elkhart Lake. [Tony: s/#include <linux/nmi.h>/#include <asm/nmi.h>/ to fix randconfig build] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | EDAC: Add three new memory typesQiuxu Zhuo2020-11-051-0/+3
| |/ | | | | | | | | | | | | | | | | There are {Low-Power DDR3/4, WIO2} types of memory. Add new entries to 'enum mem_type' and new strings to 'edac_mem_types[]' for the new types. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | EDAC/mv64x60: Remove orphan mv64x60 driverMichael Ellerman2020-12-074-1005/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mv64x60 EDAC driver depends on CONFIG_MV64X60. But that symbol is not user-selectable, and the last code that selected it was removed with the C2K board support in 2018, see: 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support") That means the driver is now dead code, so remove it. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201207040253.628528-1-mpe@ellerman.id.au
* | EDAC/aspeed: Add support for AST2400 and AST2600Troy Lee2020-12-072-5/+8
|/ | | | | | | | | Add AST2400 and AST2600 EDAC driver support. Signed-off-by: Troy Lee <troy_lee@aspeedtech.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Stefan Schaeckeler <sschaeck@cisco.com> Link: https://lkml.kernel.org/r/20201207090013.14145-3-troy_lee@aspeedtech.com
* Merge tag 'efi-core-2020-10-12' of ↵Linus Torvalds2020-10-121-2/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI changes from Ingo Molnar: - Preliminary RISC-V enablement - the bulk of it will arrive via the RISCV tree. - Relax decompressed image placement rules for 32-bit ARM - Add support for passing MOK certificate table contents via a config table rather than a EFI variable. - Add support for 18 bit DIMM row IDs in the CPER records. - Work around broken Dell firmware that passes the entire Boot#### variable contents as the command line - Add definition of the EFI_MEMORY_CPU_CRYPTO memory attribute so we can identify it in the memory map listings. - Don't abort the boot on arm64 if the EFI RNG protocol is available but returns with an error - Replace slashes with exclamation marks in efivarfs file names - Split efi-pstore from the deprecated efivars sysfs code, so we can disable the latter on !x86. - Misc fixes, cleanups and updates. * tag 'efi-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) efi: mokvar: add missing include of asm/early_ioremap.h efi: efivars: limit availability to X86 builds efi: remove some false dependencies on CONFIG_EFI_VARS efi: gsmi: fix false dependency on CONFIG_EFI_VARS efi: efivars: un-export efivars_sysfs_init() efi: pstore: move workqueue handling out of efivars efi: pstore: disentangle from deprecated efivars module efi: mokvar-table: fix some issues in new code efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure efivarfs: Replace invalid slashes with exclamation marks in dentries. efi: Delete deprecated parameter comments efi/libstub: Fix missing-prototypes in string.c efi: Add definition of EFI_MEMORY_CPU_CRYPTO and ability to report it cper,edac,efi: Memory Error Record: bank group/address and chip id edac,ghes,cper: Add Row Extension to Memory Error Record efi/x86: Add a quirk to support command line arguments on Dell EFI firmware efi/libstub: Add efi_warn and *_once logging helpers integrity: Load certs from the EFI MOK config table integrity: Move import of MokListRT certs to a separate routine efi: Support for MOK variable config table ...
| * cper,edac,efi: Memory Error Record: bank group/address and chip idAlex Kluver2020-09-171-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updates to the UEFI 2.8 Memory Error Record allow splitting the bank field into bank address and bank group, and using the last 3 bits of the extended field as a chip identifier. When needed, print correct version of bank field, bank group, and chip identification. Based on UEFI 2.8 Table 299. Memory Error Record. Signed-off-by: Alex Kluver <alex.kluver@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200819143544.155096-3-alex.kluver@hpe.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
| * edac,ghes,cper: Add Row Extension to Memory Error RecordAlex Kluver2020-09-171-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory errors could be printed with incorrect row values since the DIMM size has outgrown the 16 bit row field in the CPER structure. UEFI Specification Version 2.8 has increased the size of row by allowing it to use the first 2 bits from a previously reserved space within the structure. When needed, add the extension bits to the row value printed. Based on UEFI 2.8 Table 299. Memory Error Record Signed-off-by: Alex Kluver <alex.kluver@hpe.com> Tested-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200819143544.155096-2-alex.kluver@hpe.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
* | Merge tag 'ras_updates_for_v5.10' of ↵Linus Torvalds2020-10-121-3/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Extend the recovery from MCE in kernel space also to processes which encounter an MCE in kernel space but while copying from user memory by sending them a SIGBUS on return to user space and umapping the faulty memory, by Tony Luck and Youquan Song. - memcpy_mcsafe() rework by splitting the functionality into copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables support for new hardware which can recover from a machine check encountered during a fast string copy and makes that the default and lets the older hardware which does not support that advance recovery, opt in to use the old, fragile, slow variant, by Dan Williams. - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta. - Do not use MSR-tracing accessors in #MC context and flag any fault while accessing MCA architectural MSRs as an architectural violation with the hope that such hw/fw misdesigns are caught early during the hw eval phase and they don't make it into production. - Misc fixes, improvements and cleanups, as always. * tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Allow for copy_mc_fragile symbol checksum to be generated x86/mce: Decode a kernel instruction to determine if it is copying from user x86/mce: Recover from poison found while copying from user space x86/mce: Avoid tail copy when machine check terminated a copy from user x86/mce: Add _ASM_EXTABLE_CPY for copy user access x86/mce: Provide method to find out the type of an exception handler x86/mce: Pass pointer to saved pt_regs to severity calculation routines x86/copy_mc: Introduce copy_mc_enhanced_fast_string() x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list x86/mce: Add Skylake quirk for patrol scrub reported errors RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE() x86/mce: Annotate mce_rd/wrmsrl() with noinstr x86/mce/dev-mcelog: Do not update kflags on AMD systems x86/mce: Stop mce_reign() from re-computing severity for every CPU x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR x86/mce: Increase maximum number of banks to 64 x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check() x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap RAS/CEC: Fix cec_init() prototype
| * | x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmapYazen Ghannam2020-08-201-3/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | The Extended Error Code Bitmap (xec_bitmap) for a Scalable MCA bank type was intended to be used by the kernel to filter out invalid error codes on a system. However, this is unnecessary after a few product releases because the hardware will only report valid error codes. Thus, there's no need for it with future systems. Remove the xec_bitmap field and all references to it. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200720145353.43924-1-Yazen.Ghannam@amd.com
* | Merge tag 'edac_updates_for_v5.10' of ↵Linus Torvalds2020-10-1217-50/+420
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add Amazon's Annapurna Labs memory controller EDAC driver (Talel Shenhar) - New AMD CPUs support (Yazen Ghannam) - The usual misc fixes and cleanups all over the subsystem * tag 'edac_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Set proper family type for Family 19h Models 20h-2Fh EDAC/mc_sysfs: Add missing newlines when printing {max,dimm}_location EDAC/aspeed: Use module_platform_driver() to simplify EDAC, sb_edac: Simplify switch statement EDAC/ti: Fix handling of platform_get_irq() error EDAC/aspeed: Fix handling of platform_get_irq() error EDAC/i5100: Fix error handling order in i5100_init_one() EDAC/highbank: Handover Calxeda Highbank maintenance to Andre Przywara EDAC/socfpga: Transfer SoCFPGA EDAC maintainership EDAC/thunderx: Make symbol lmc_dfs_ents static EDAC/al-mc-edac: Add Amazon's Annapurna Labs Memory Controller driver dt-bindings: EDAC: Add Amazon's Annapurna Labs Memory Controller binding EDAC/mce_amd: Add new error descriptions for existing types EDAC: Replace HTTP links with HTTPS ones
| * \ Merge branch 'edac-drivers' into edac-updates-for-v5.10Borislav Petkov2020-10-123-0/+362
| |\ \ | | | | | | | | | | | | Signed-off-by: Borislav Petkov <bp@suse.de>
| | * | EDAC/al-mc-edac: Add Amazon's Annapurna Labs Memory Controller driverTalel Shenhar2020-08-173-0/+362
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Amazon's Annapurna Labs Memory Controller EDAC supports ECC capability for error detection and correction (Single bit error correction, Double detection). This driver introduces EDAC driver for that capability. [ bp: Remove "EDAC" string from Kconfig tristate as it is redundant. ] Signed-off-by: Talel Shenhar <talel@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lkml.kernel.org/r/20200816185551.19108-3-talel@amazon.com
| * | EDAC/amd64: Set proper family type for Family 19h Models 20h-2FhYazen Ghannam2020-10-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD Family 19h Models 20h-2Fh use the same PCI IDs as Family 17h Models 70h-7Fh. The same family ops and number of channels also apply. Use the Family17h Model 70h family_type and ops for Family 19h Models 20h-2Fh. Update the controller name to match the system. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201009171803.3214354-1-Yazen.Ghannam@amd.com
| * | EDAC/mc_sysfs: Add missing newlines when printing {max,dimm}_locationXiongfeng Wang2020-09-181-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reading those sysfs entries gives: [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/max_location memory 3 [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/dimm0/dimm_location memory 0 [root@localhost /]# Add newlines after the value it prints for better readability. [ bp: Make len a signed int and change the check to catch wraparound. Increment the pointer p only when the length check passes. Use scnprintf(). ] Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1600051734-8993-1-git-send-email-wangxiongfeng2@huawei.com
| * | EDAC/aspeed: Use module_platform_driver() to simplifyLiu Shixin2020-09-181-17/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use module_platform_driver() which makes the code simpler by eliminating boilerplate code. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lkml.kernel.org/r/20200914065358.3726216-1-liushixin2@huawei.com
| * | EDAC, sb_edac: Simplify switch statementTom Rix2020-09-081-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang static analyzer reports this problem sb_edac.c:959:2: warning: Undefined or garbage value returned to caller return type; ^~~~~~~~~~~ This is a false positive. However by initializing the type to DEV_UNKNOWN the 3 case can be removed from the switch, saving a comparison and jump. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200907153225.7294-1-trix@redhat.com
| * | EDAC/ti: Fix handling of platform_get_irq() errorKrzysztof Kozlowski2020-09-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | platform_get_irq() returns a negative error number on error. In such a case, comparison to 0 would pass the check therefore check the return value properly, whether it is negative. [ bp: Massage commit message. ] Fixes: 86a18ee21e5e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tero Kristo <t-kristo@ti.com> Link: https://lkml.kernel.org/r/20200827070743.26628-2-krzk@kernel.org
| * | EDAC/aspeed: Fix handling of platform_get_irq() errorKrzysztof Kozlowski2020-09-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | platform_get_irq() returns a negative error number on error. In such a case, comparison to 0 would pass the check therefore check the return value properly, whether it is negative. [ bp: Massage commit message. ] Fixes: 9b7e6242ee4e ("EDAC, aspeed: Add an Aspeed AST2500 EDAC driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Stefan Schaeckeler <schaecsn@gmx.net> Link: https://lkml.kernel.org/r/20200827070743.26628-1-krzk@kernel.org
| * | EDAC/i5100: Fix error handling order in i5100_init_one()Dinghao Liu2020-09-011-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When pci_get_device_func() fails, the driver doesn't need to execute pci_dev_put(). mci should still be freed, though, to prevent a memory leak. When pci_enable_device() fails, the error injection PCI device "einj" doesn't need to be disabled either. [ bp: Massage commit message, rename label to "bail_mc_free". ] Fixes: 52608ba205461 ("i5100_edac: probe for device 19 function 0") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn
| * | EDAC/thunderx: Make symbol lmc_dfs_ents staticWei Yongjun2020-08-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Symbol 'lmc_dfs_ents' is not used outside of thunderx_edac.c, so make it static: drivers/edac/thunderx_edac.c:457:22: warning: symbol 'lmc_dfs_ents' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Robert Richter <rrichter@marvell.com> Link: https://lkml.kernel.org/r/20200714142308.46612-1-weiyongjun1@huawei.com
| * | EDAC/mce_amd: Add new error descriptions for existing typesYazen Ghannam2020-08-171-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A few existing MCA bank types will have new error types in future SMCA systems. Add the descriptions for the new error types. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200708153515.1911642-1-Yazen.Ghannam@amd.com
| * | EDAC: Replace HTTP links with HTTPS onesAlexander A. Klimov2020-08-178-13/+13
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. [ bp: Merge all EDAC patches into a single one. ] Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tero Kristo <t-kristo@ti.com> # ti_edac Link: https://lkml.kernel.org/r/20200708113546.14135-1-grandmaster@al2klimov.de
* | EDAC/ghes: Check whether the driver is on the safe list correctlyBorislav Petkov2020-09-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, a system would try to probe, unregister and probe again a driver. When ghes_edac is attempted to be loaded on a system which is not on the safe platforms list, ghes_edac_register() would return early. The unregister counterpart ghes_edac_unregister() would still attempt to unregister and exit early at the refcount test, leading to the refcount underflow below. In order to not do *anything* on the unregister path too, reuse the force_load parameter and check it on that path too, before fumbling with the refcount. ghes_edac: ghes_edac_register: entry ghes_edac: ghes_edac_register: return -ENODEV ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 10 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xb9/0x100 Modules linked in: CPU: 10 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #12 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:refcount_warn_saturate+0xb9/0x100 Code: 82 e8 fb 8f 4d 00 90 0f 0b 90 90 c3 80 3d 55 4c f5 00 00 75 88 c6 05 4c 4c f5 00 01 90 48 c7 c7 d0 8a 10 82 e8 d8 8f 4d 00 90 <0f> 0b 90 90 c3 80 3d 30 4c f5 00 00 0f 85 61 ff ff ff c6 05 23 4c RSP: 0018:ffffc90000037d58 EFLAGS: 00010292 RAX: 0000000000000026 RBX: ffff88840b8da000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff8216b24f RDI: 00000000ffffffff RBP: ffff88840c662e00 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000046 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88840ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000800002211000 CR4: 00000000003506e0 Call Trace: ghes_edac_unregister ghes_remove platform_drv_remove really_probe driver_probe_device device_driver_attach __driver_attach ? device_driver_attach ? device_driver_attach bus_for_each_dev bus_add_driver driver_register ? bert_init ghes_init do_one_initcall ? rcu_read_lock_sched_held kernel_init_freeable ? rest_init kernel_init ret_from_fork ... ghes_edac: ghes_edac_unregister: FALSE, refcount: -1073741824 Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200911164950.GB19320@zn.tnic
* | EDAC/ghes: Clear scanned data on unloadBorislav Petkov2020-09-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") didn't clear all the information from the scanned system and, more specifically, left ghes_hw.num_dimms to its previous value. On a second load (CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use the leftover num_dimms value which is not 0 and thus the 0 check in enumerate_dimms() will get bypassed and it would go directly to the pointer deref: d = &hw->dimms[hw->num_dimms]; which is, of course, NULL: #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:enumerate_dimms.cold+0x7b/0x375 Reset the whole ghes_hw on driver unregister so that no stale values are used on a second system scan. Fixes: b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") Cc: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200911164817.GA19320@zn.tnic
* | Merge tag 'edac_urgent_for_v5.9_rc3' of ↵Linus Torvalds2020-08-301-4/+6
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: "A fix to properly clear ghes_edac driver state on driver remove so that a subsequent load can probe the system properly (Shiju Jose)" * tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()
| * | EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()Shiju Jose2020-08-271-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After b9cae27728d1 ("EDAC/ghes: Scan the system once on driver init") and with CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled, ghes_hw.dimms becomes a NULL pointer after the second ->probe() (aka ghes_edac_register()) which the config option causes to be called. This happens because the static variable which holds down whether the system has been scanned already, doesn't get reset in ghes_edac_unregister(). Then, on the second probe, ghes_scan_system() doesn't get to enumerate the DIMMs, leading to ghes_hw.dimms remaining NULL. Clear the variable and rename it to something more descriptive so that a second probe succeeds. [ bp: Rewrite commit message. ] Fixes: b9cae27728d1 ("EDAC/ghes: Scan the system once on driver init") Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200827140450.1620-1-shiju.jose@huawei.com
* | | treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2020-08-232-2/+2
|/ / | | | | | | | | | | | | | | | | | | Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
* / EDAC/{i7core,sb,pnd2,skx}: Fix error event severityTony Luck2020-08-184-7/+7
|/ | | | | | | | | | | | | | | | | | IA32_MCG_STATUS.RIPV indicates whether the return RIP value pushed onto the stack as part of machine check delivery is valid or not. Various drivers copied a code fragment that uses the RIPV bit to determine the severity of the error as either HW_EVENT_ERR_UNCORRECTED or HW_EVENT_ERR_FATAL, but this check is reversed (marking errors where RIPV is set as "FATAL"). Reverse the tests so that the error is marked fatal when RIPV is not set. Reported-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200707194324.14884-1-tony.luck@intel.com
* Merge tag 'edac_updates_for_5.9_pt2' of ↵Linus Torvalds2020-08-151-3/+47
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull edac fix from Tony Luck: "Fix for the ie31200 driver that missed the first pull" * tag 'edac_updates_for_5.9_pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/ie31200: Fallback if host bridge device is already initialized
| * EDAC/ie31200: Fallback if host bridge device is already initializedJason Baron2020-08-101-3/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Intel uncore driver may claim some of the pci ids from ie31200 which means that the ie31200 edac driver will not initialize them as part of pci_register_driver(). Let's add a fallback for this case to 'pci_get_device()' to get a reference on the device such that it can still be configured. This is similar in approach to other edac drivers. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/1594923911-10885-1-git-send-email-jbaron@akamai.com
* | Merge tag 'edac_updates_for_5.9' of ↵Linus Torvalds2020-08-037-141/+204
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Tony Luck: "Boris is on vacation and aske me to send you the EDAC changes" * tag 'edac_updates_for_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC: Fix reference count leaks EDAC: Remove edac_get_dimm_by_index() EDAC/ghes: Scan the system once on driver init EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_pvt EDAC/ghes: Setup DIMM label from DMI and use it in error reports EDAC, {skx,i10nm}: Use CPU stepping macro to pass configurations EDAC/mc: Call edac_inc_ue_error() before panic EDAC, pnd2: Set MCE_PRIO_EDAC priority for pnd2_mce_dec notifier
| * Merge branch 'edac-ghes' into edac-for-nextBorislav Petkov2020-06-221-130/+193
| |\
| | * EDAC/ghes: Scan the system once on driver initBorislav Petkov2020-06-161-113/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the hardware scanning and figuring out how many DIMMs a machine has to a single, one-time thing which happens once on driver init. After that scanning completes, struct ghes_hw_desc contains a representation of the hardware which the driver can then use for later initialization. Then, copy the DIMM information into the respective EDAC core representation of those. Get rid of ghes_edac_dimm_fill and use a struct dimm_info array directly. This way, hw detection and further driver initialization is nicely and logically split. Further additions should all be added to ghes_scan_system() and the hw representation extended as needed. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <bp@suse.de>
| | * EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_pvtRobert Richter2020-06-161-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct members list and ghes of struct ghes_edac_pvt are unused, remove them. On that occasion, rename it to the shorter name struct ghes_pvt. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200519104443.15673-2-rrichter@marvell.com