summaryrefslogtreecommitdiffstats
path: root/drivers/edac
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'ras-core-for-linus' of ↵Linus Torvalds2017-02-205-5/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - Assign notifier chain priorities for all RAS related handlers to make the ordering explicit (Borislav Petkov) - Improve the AMD MCA banks sysfs output (Yazen Ghannam) - Various cleanups and restructuring of the x86 RAS code (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority x86/ras: Get rid of mce_process_work() EDAC/mce/amd: Dump TSC value EDAC/mce/amd: Unexport amd_decode_mce() x86/ras/amd/inj: Change dependency x86/ras: Flip the TSC-adding logic x86/ras/amd: Make sysfs names of banks more user-friendly x86/ras/therm_throt: Do not log a fake MCE for thermal events x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
| * x86/ras, EDAC, acpi: Assign MCE notifier handlers a priorityBorislav Petkov2017-01-244-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assign all notifiers on the MCE decode chain a priority so that they get called in the correct order. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * EDAC/mce/amd: Dump TSC valueBorislav Petkov2017-01-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Dump the TSC value of the time when the MCE got logged. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-8-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * EDAC/mce/amd: Unexport amd_decode_mce()Borislav Petkov2017-01-242-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not used outside of the driver anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-7-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | EDAC, mce_amd: Print IPID and Syndrome on a separate lineYazen Ghannam2017-02-161-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the IPID and Syndrome are printed on the same line as the Address. There are cases when we can have a valid Syndrome but not a valid Address. For example, the MCA_SYND register can be used to hold more detailed error info that the hardware folks can use. It's not just DRAM ECC syndromes. There are some error types that aren't related to memory that may have valid syndromes, like some errors related to links in the Data Fabric, etc. In these cases, the IPID and Syndrome are not printed at the same log level as the rest of the stanza, so users won't see them on the console. Console: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Dmesg: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002 [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Print the IPID first and on a new line. The IPID should always be printed on SMCA systems. The Syndrome will then be printed with the IPID and at the same log level when valid: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000 [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, amd64: Bump driver versionBorislav Petkov2017-02-141-1/+1
| | | | | | | | | | | | | | | | Last time we did that was when we enabled Bulldozer. Now, we enabled Zen so it is only natural ... :-) Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
* | EDAC, fsl_ddr: Make locally used symbols staticWei Yongjun2017-02-091-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following sparse warnings: drivers/edac/fsl_ddr_edac.c:148:1: warning: symbol 'dev_attr_inject_data_hi' was not declared. Should it be static? drivers/edac/fsl_ddr_edac.c:150:1: warning: symbol 'dev_attr_inject_data_lo' was not declared. Should it be static? drivers/edac/fsl_ddr_edac.c:152:1: warning: symbol 'dev_attr_inject_ctrl' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170209150424.15124-1-weiyj.lk@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, mpc85xx: Add T2080 l2-cache supportChris Packham2017-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The L2 cache controller on the T2080 SoC has similar capabilities to the others already supported by the mpc85xx_edac driver. Add it to the list of compatible devices. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Johannes Thumshirn <jth@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: devicetree@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, amd64: Add x86cpuid sanity check during initYazen Ghannam2017-01-282-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | Match one of the devices in amd64_cpuids[] before loading the module. This is an additional sanity check against users trying to load amd64_edac_mod on unsupported systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com [ Get rid of err_ret label, make it a bit more readable this way. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, amd64: Don't treat ECC disabled as failureYazen Ghannam2017-01-281-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having ECC disabled on a node doesn't necessarily mean that it's disabled for the entire system. So let's return a non-failing code when ECC is disabled on a node. This way we can skip initialization for the node but still continue with the remaining nodes. After probing all instances, make sure we have at least one MC device allocated. This issue is seen and fix tested on Fam15h and Fam17h MCM systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC: Add routine to check if MC devices list is emptyYazen Ghannam2017-01-282-0/+23
| | | | | | | | | | | | | | | | | | | | We need to know if any MC devices have been allocated. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com [ Prettify text. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, amd64: Remove unused printing macrosYazen Ghannam2017-01-281-6/+0
| | | | | | | | | | | | | | | | | | amd64_{debug,notice} don't have any users, so remove them. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, amd64: Rework messages in ecc_enabled()Yazen Ghannam2017-01-281-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Print the node number when informing that DRAM ECC is disabled so that we can show which nodes have DRAM ECC disabled. Also, print more detailed system information as edac_dbg(), so as to not bother general users. Switch amd64_notice to amd64_info to match the message above it. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, amd64: Move global code out of instance functionsYazen Ghannam2017-01-281-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a few functions that register/unregister an ECC error decoding routine. These functions are called when we init/remove instances. However, they are global and so don't need to be registered/unregistered multiple times. So move them out of the init/remove instance functions and into the module init/exit routines. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, amd64: Free unused memory when init_one_instance() failsYazen Ghannam2017-01-281-0/+2
| | | | | | | | | | | | | | | | | | Jump to memory freeing routines when init_one_instance() fails. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-3-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, mce_amd: Give more context to deferred error messageYazen Ghannam2017-01-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | Users may not be familiar with the concept of deferred errors. There is no action for users to take on this type of error, so give more context in the error message to make this more clear. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, i7300: Test for the second channel properlyBorislav Petkov2017-01-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | REDMEMB[17] is the ECC_Locator bit, which, when set, identifies the CS[3:2] as the simbols in error. And thus the second channel. The macro computing it was wrong so get rid of it (it was used at one place only) and get rid of the conditional too. Generates better code this way anyway. Signed-off-by: Borislav Petkov <bp@suse.de> Reported-by: David Binderman <dcb314@hotmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* | EDAC, sb_edac: Get rid of ->show_interleave_mode()Nicolas Iooss2017-01-231-34/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function sbridge_register_mci() sets pvt->info.show_interleave_mode to knl_show_interleave_mode() on Knight's Landing and show_interleave_mode() anywhere else. Merge show_interleave_mode() and knl_show_interleave_mode() in a single implementation and use it without an indirect function pointer. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org [ Call it get_intlv_mode_str(). ] Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC: Expose per-DIMM error counts in sysfsAaron Miller2017-01-191-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | The old csrowX sysfs directories have per-csrow error counters, but the new dimmX directories do not currently expose error counts. EDAC already keeps these counts, add them to sysfs so per-DIMM counts are still available when CONFIG_EDAC_LEGACY_SYSFS=n. Signed-off-by: Aaron Miller <aaronmiller@fb.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, amd64: Save and return err code from probe_one_instance()Yazen Ghannam2017-01-161-2/+4
| | | | | | | | | | | | | | | | | | | | | | We should save the return code from probe_one_instance() so that it can be returned from the module init function. Otherwise, we'll be returning the -ENOMEM from above. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1484322741-41884-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC, i82975x: Add ioremap_nocache() error handlingArvind Yadav2017-01-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | If ioremap_nocache() fails, it will return NULL. Which will then cause a NULL-pointer dereference. Handle the returned value properly. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1484549092-11349-1-git-send-email-arvind.yadav.cs@gmail.com [ Boris: massage commit message and improve error message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* | EDAC: Make dev_attr_sdram_scrub_rate staticBen Dooks2017-01-061-1/+1
|/ | | | | | | | | | | | | | The dev_attr_sdram_scrub_rate is not declared in a header or used anywhere else, so make it static to fix the following warning: drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol 'dev_attr_sdram_scrub_rate' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk Signed-off-by: Borislav Petkov <bp@suse.de>
* Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2016-12-243-3/+3
| | | | | | | | | | | | | This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* edac: fix kernel-doc tags at the drivers/edac_*.hMauro Carvalho Chehab2016-12-151-20/+35
| | | | | | | Some kernel-doc tags don't provide good descriptions or use a different style. Adjust them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* driver-api: create an edac.rst file with EDAC documentationMauro Carvalho Chehab2016-12-151-2/+0
| | | | | | | | | | Currently, there's no device driver documentation for the EDAC subsystem at the driver-api book. Fill in the blanks for the structures and functions that misses documentation, uniform the word on the existing ones, and add a new edac.rst file at driver-api, in order to document the EDAC subsystem. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* edac: move documentation from edac_mc.c to edac_core.hMauro Carvalho Chehab2016-12-152-82/+101
| | | | | | | | | | Several functions are documented at edac_mc.c. As we'll be including edac_core.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* edac: move documentation from edac_pci*.c to edac_pci.hMauro Carvalho Chehab2016-12-153-76/+98
| | | | | | | | | | | | | Several functions are documented at edac_pci.c and edac_pci_sysfs.c. As we'll be including edac_pci.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. As several of those kernel-doc macros are not in the right format, fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* edac: move documentation from edac_device to edac_core.hMauro Carvalho Chehab2016-12-152-58/+51
| | | | | | | | | | | | | Several functions are documented at edac_device.c. As we'll be including edac_core.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. As several of those kernel-doc macros are not in the right format, fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* edac: rename edac_core.h to edac_mc.hMauro Carvalho Chehab2016-12-1546-52/+36
| | | | | | | Now, all left at edac_core.h are at drivers/edac/edac_mc.c, so rename it to edac_mc.h. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* edac: move EDAC device definitions to drivers/edac/edac_device.hMauro Carvalho Chehab2016-12-154-251/+281
| | | | | | | | | | The edac_core.h header contain data structures and function definitions for both EDAC MC and EDAC device. Let's move the devices ones to a separate header file, as part of a header reorganization. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* edac: move EDAC PCI definitions to drivers/edac/edac_pci.hMauro Carvalho Chehab2016-12-154-156/+183
| | | | | | | | | | The edac_core.h header contain data structures and function definitions for the 3 parts of EDAC: MC, PCI and device. Let's move the PCI ones to a separate header file, as part of a header reorganization. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* edac: edac_core.h: remove prototype for edac_pci_reset_delay_period()Mauro Carvalho Chehab2016-12-151-3/+0
| | | | | | This function doesn't exist. So, remove its prototype. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* edac: edac_core.h: get rid of unused kobj_completeMauro Carvalho Chehab2016-12-151-1/+0
| | | | | | | This element of struct edac_pci_ctl_info is never used. So, get rid of it. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* EDAC, amd64: Fix improper return valuePan Bian2016-12-041-1/+1
| | | | | | | | | | | | | When the call to zalloc_cpumask_var() fails, returning "false" seems improper. The real value of macro "false" is 0, and 0 means no error. Return -ENOMEM instead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189071 Signed-off-by: Pan Bian <bianpan2016@163.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1480831638-5361-1-git-send-email-bianpan201604@163.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Improve amd64-specific printing macrosBorislav Petkov2016-12-012-12/+8
| | | | | | | | | | | | Prefix the warn and error macros with the respective string so that callers don't have to say "Error" or "Warning". We save us string length this way in the actual calls. While at it, shorten the calls in reserve_mc_sibling_devs(). Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
* EDAC, amd64: Autoload amd64_edac_mod on Fam17h systemsYazen Ghannam2016-11-291-0/+1
| | | | | | | | | | | Add Fam17h to the list of families to autoload amd64_edac_mod. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-18-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Define and register UMC error decode functionYazen Ghannam2016-11-292-3/+88
| | | | | | | | | | | | | | | How we need to decode UMC errors is different from how we decode bus errors, so let's define a new function for this. We also need a way to determine the UMC channel since we're not guaranteed that there is a fixed relation between channel and MCA bank. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com [ Fold in decode_synd_reg(), simplify. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Determine EDAC capabilities on Fam17h systemsYazen Ghannam2016-11-291-6/+24
| | | | | | | | | | | | We need to determine the EDAC capabilities from all UMCs on the node. We should only check UMCs that are enabled and make sure they all agree. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-15-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Determine EDAC MC capabilities on Fam17hYazen Ghannam2016-11-291-18/+49
| | | | | | | | | | | | | | | The UMCs on Fam17h are independent memory controllers so we need to read the capabilities from all UMCs and make sure they agree. Once we determine what capabilities are available we should save them for convenience. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480431116-94683-1-git-send-email-Yazen.Ghannam@amd.com [ Simplify f17h_determine_edac_ctl_cap(), preinit edac_mode in init_csrows(). ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Add Fam17h debug outputYazen Ghannam2016-11-292-8/+94
| | | | | | | | | | | | | Read a few more UMC registers and provide debug output in order to be as similar as possible to older AMD systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com [ Remove unneeded K8 check and comments, fixup others. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Add Fam17h scrubber supportYazen Ghannam2016-11-282-5/+40
| | | | | | | | | | | | Fam17h has new register offsets and fields for setting up the DRAM scrubber so add support for this. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, mce_amd: Don't report poison bit on Fam15h, bank 4Yazen Ghannam2016-11-281-4/+7
| | | | | | | | | | | | | | | | | MCA_STATUS[43] has been defined as "Poison" or "Reserved" for every bank since Fam15h except for Fam15h, bank 4 in which case it's defined as part of the McaStatSubCache bitfield. Filter out that case. Reported-by: Dean Liberty <Dean.Liberty@amd.com> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479478222-19896-1-git-send-email-Yazen.Ghannam@amd.com [ Split an almost unparseable ternary conditional, add a comment. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Read MC registers on AMD Fam17hYazen Ghannam2016-11-282-39/+146
| | | | | | | | | | | | | | | | | | Fam17h has a different set of registers and bitfields. Most of these registers are read through SMN (System Management Network) rather than PCI config space. Also, the derivation of various values is now different. Update amd64_edac to read the appropriate registers and extract the correct values for Fam17h. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com [ Save us the indentation level in read_mc_regs(), add defines ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Reserve correct PCI devices on AMD Fam17hYazen Ghannam2016-11-282-19/+70
| | | | | | | | | | | | | | | | | Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older systems. Update struct amd64_pvt to hold the new functions and reserve them if on Fam17h. Also, allocate an array of UMC structs within our newly allocated PVT struct. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com [ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Add AMD Fam17h family type and opsYazen Ghannam2016-11-242-1/+54
| | | | | | | | | | | | | Add a family type and associated ops for Fam17h. Define a struct to hold all the UMC registers that we need. Make this a part of struct amd64_pvt in order to maximize code reuse in the rest of the driver. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Extend ecc_enabled() to Fam17hYazen Ghannam2016-11-242-10/+56
| | | | | | | | | | | | | | Update the ecc_enabled() function to work on Fam17h. This entails reading a different set of registers and using the SMN (System Management Network) rather than PCI devices. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com [ Fixup ecc_en assignment and get_umc_base(). ] Signed-off-by: Borislav Petkov <bp@suse.de>
* Merge tip:ras/core to pick up dependent changesBorislav Petkov2016-11-231-6/+6
|\ | | | | | | | | | | | | tip:ras/core contains the respective Fam17h x86 RAS bits which amd64_edac is going to use. So merge it into the EDAC branch. Signed-off-by: Borislav Petkov <bp@suse.de>
| * x86/RAS: Hide SMCA bank namesBorislav Petkov2016-11-081-1/+1
| | | | | | | | | | | | | | | | | | Add accessor functions and hide the smca_names array. Also, add a sanity-check to bank HWID assignment in get_smca_bank_info(). Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86/RAS: Rename smca_bank_names to smca_namesBorislav Petkov2016-11-081-1/+1
| | | | | | | | | | | | | | | | | | Make it differ more from struct smca_bank_name for better readability. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86/RAS: Simplify SMCA HWID descriptor structBorislav Petkov2016-11-081-5/+5
| | | | | | | | | | | | | | | | | | Call it simply smca_hwid and call local variables "hwid". More readable. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>