!163 ICX: EDAC driver decoder for Ice Lake
Merge Pull Request from: @youquan_song [Description] https://gitee.com/openeuler/intel-kernel/issues/I5V3IO Current i10nm_edac only supports firmware decoder (ACPI DSM methods). MCA bank registers of Ice Lake or Tremont CPUs contain the information to decode DDR memory errors. To get better decoding performance, add the driver decoder (decoding DDR memory errors via extracting error information from MCA bank registers) for Ice Lake and Tremont CPUs. the patchset will be valuable to avoid SMI triggered to call firware decoder, especially when CE (Correctable Error) triggered frequently on DDR memory. fe32f366 EDAC/skx_common: Use driver decoder first 627d551a EDAC/skx_common: Make output format similar 2738c69a EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs [Testing] #echo 1 > /sys/module/i10nm_edac/parameters/decoding_via_mca #modprobe einj #rdmsr 0x34 (read SMI count) 132 #/home/ras-tools/cmcistorm 1 0: vaddr = 0x1401490 paddr = 9686af490 #rdmsr 0x34 133 --- only increase one for EINJ error injection. Avoid the SMI increase for EDAC decode by call _DSM. #dmesg [ 467.460634] EINJ: Error INJection is initialized. [ 666.964249] mce: [Hardware Error]: Machine check events logged [ 666.964258] EDAC skx MC7: HANDLING MCE MEMORY ERROR [ 666.964262] EDAC skx MC7: CPU 36: Machine Check Event: 0x0 Bank 25: 0x8c00004200800090 [ 666.964265] EDAC skx MC7: TSC 0x1ca2cdd7071 [ 666.964267] EDAC skx MC7: ADDR 0x9686af480 [ 666.964269] EDAC skx MC7: MISC 0x9004b016851cc86 [ 666.964272] EDAC skx MC7: PROCESSOR 0:0x606a6 TIME 1529666570 SOCKET 1 APIC 0x80 [ 666.964297] EDAC DEBUG: skx_mce_output_error: err_code:0x0080:0x0090 ProcessorSocketId:0x1 MemoryControllerId:0x3 PhysicalRankId:0x1 Row:0x2d0a Column:0x398 Bank:0x2 BankGroup:0x3 retry_rd_err_log[00438209 00000000 00000001 07316041 00002d0a 00000009686af480] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000] [ 666.964308] EDAC MC7: 1 CE memory read error on CPU_SrcID#1_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x9686af offset:0x480 grain:32 syndrome:0x0 - err_code:0x0080:0x0090 ProcessorSocketId:0x1 MemoryControllerId:0x3 PhysicalRankId:0x1 Row:0x2d0a Column:0x398 Bank:0x2 BankGroup:0x3 retry_rd_err_log[00438209 00000000 00000001 07316041 00002d0a 00000009686af480] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000]) Link:https://gitee.com/openeuler/kernel/pulls/163 Reviewed-by:Zheng Zengkai <zhengzengkai@huawei.com> Reviewed-by:
Jun Tian <jun.j.tian@intel.com> Reviewed-by:
Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by:
Zheng Zengkai <zhengzengkai@huawei.com>
Loading
Please sign in to comment