Unverified Commit 28e59bca authored by openeuler-ci-bot's avatar openeuler-ci-bot Committed by Gitee
Browse files

!163 ICX: EDAC driver decoder for Ice Lake

Merge Pull Request from: @youquan_song 
 
[Description]​
https://gitee.com/openeuler/intel-kernel/issues/I5V3IO

Current i10nm_edac only supports firmware decoder (ACPI DSM methods).
MCA bank registers of Ice Lake or Tremont CPUs contain the information
to decode DDR memory errors. To get better decoding performance, add
the driver decoder (decoding DDR memory errors via extracting error
information from MCA bank registers) for Ice Lake and Tremont CPUs.

the patchset will be valuable to avoid SMI triggered to call firware decoder, especially when CE (Correctable Error) triggered frequently on DDR memory.

fe32f366 EDAC/skx_common: Use driver decoder first
627d551a EDAC/skx_common: Make output format similar
2738c69a EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs

[Testing]
#echo 1 > /sys/module/i10nm_edac/parameters/decoding_via_mca
#modprobe einj
#rdmsr 0x34    (read SMI count)
132
#/home/ras-tools/cmcistorm 1
0: vaddr = 0x1401490 paddr = 9686af490
#rdmsr 0x34
133      --- only increase one for EINJ error injection. Avoid the SMI increase for EDAC decode by call _DSM.  
#dmesg
[ 467.460634] EINJ: Error INJection is initialized.
[ 666.964249] mce: [Hardware Error]: Machine check events logged
[ 666.964258] EDAC skx MC7: HANDLING MCE MEMORY ERROR
[ 666.964262] EDAC skx MC7: CPU 36: Machine Check Event: 0x0 Bank 25: 0x8c00004200800090
[ 666.964265] EDAC skx MC7: TSC 0x1ca2cdd7071
[ 666.964267] EDAC skx MC7: ADDR 0x9686af480
[ 666.964269] EDAC skx MC7: MISC 0x9004b016851cc86
[ 666.964272] EDAC skx MC7: PROCESSOR 0:0x606a6 TIME 1529666570 SOCKET 1 APIC 0x80
[ 666.964297] EDAC DEBUG: skx_mce_output_error: err_code:0x0080:0x0090 ProcessorSocketId:0x1 MemoryControllerId:0x3 PhysicalRankId:0x1 Row:0x2d0a Column:0x398 Bank:0x2 BankGroup:0x3 retry_rd_err_log[00438209 00000000 00000001 07316041 00002d0a 00000009686af480] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000]
[ 666.964308] EDAC MC7: 1 CE memory read error on CPU_SrcID#1_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x9686af offset:0x480 grain:32 syndrome:0x0 - err_code:0x0080:0x0090 ProcessorSocketId:0x1 MemoryControllerId:0x3 PhysicalRankId:0x1 Row:0x2d0a Column:0x398 Bank:0x2 BankGroup:0x3 retry_rd_err_log[00438209 00000000 00000001 07316041 00002d0a 00000009686af480] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000]) 
 
Link:https://gitee.com/openeuler/kernel/pulls/163

 
Reviewed-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: default avatarJun Tian <jun.j.tian@intel.com>
Reviewed-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
parents 4aff3e5e f4aa30cd
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment