Unverified Commit 4ef5a878 authored by openeuler-ci-bot's avatar openeuler-ci-bot Committed by Gitee
Browse files

!171 SPR: HBM retry_rd_err_log support

Merge Pull Request from: @youquan_song 
 
[Description]​
https://gitee.com/openeuler/intel-kernel/issues/I5V3SJ

An HBM memory channel is divided into two pseudo channels. Each
pseudo channel has its own retry_rd_err_log registers. Retrieve and
print retry_rd_err_log registers of the HBM pseudo channel if the
memory error is from HBM.

14646de4 EDAC/skx_common: Add ChipSelect ADXL component
acd4cf68 EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
d5f5e499 EDAC/i10nm: Print an extra register set of retry_rd_err_log

[Testing]
1.Add kernel options in grub: efi=nosoftreserve i10nm_edac.retry_rd_err_log=1
2.numactl -H
node distances:
node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0: 10 12 12 12 21 21 21 21 13 14 14 14 23 23 23 23
1: 12 10 12 12 21 21 21 21 14 13 14 14 23 23 23 23
2: 12 12 10 12 21 21 21 21 14 14 13 14 23 23 23 23
3: 12 12 12 10 21 21 21 21 14 14 14 13 23 23 23 23
4: 21 21 21 21 10 12 12 12 23 23 23 23 13 14 14 14
5: 21 21 21 21 12 10 12 12 23 23 23 23 14 13 14 14
6: 21 21 21 21 12 12 10 12 23 23 23 23 14 14 13 14
7: 21 21 21 21 12 12 12 10 23 23 23 23 14 14 14 13
8: 13 14 14 14 23 23 23 23 10 14 14 14 23 23 23 23
9: 14 13 14 14 23 23 23 23 14 10 14 14 23 23 23 23
10: 14 14 13 14 23 23 23 23 14 14 10 14 23 23 23 23
11: 14 14 14 13 23 23 23 23 14 14 14 10 23 23 23 23
12: 23 23 23 23 13 14 14 14 23 23 23 23 10 14 14 14
13: 23 23 23 23 14 13 14 14 23 23 23 23 14 10 14 14
14: 23 23 23 23 14 14 13 14 23 23 23 23 14 14 10 14
15: 23 23 23 23 14 14 14 13 23 23 23 23 14 14 14 10
3. #modprobe einj
4. git clone https://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git, build it
5. # numactl --cpunodebind=0 --membind=13 /home/ras-tools/cmcistorm 1
0: vaddr = 0x130d490 paddr = d87ff42490
6. #dmesg: output retry_rd_err_log registers value.

[83086.997090] EDAC MC29: 0 CE memory read error on CPU_SrcID#1_HBMC#9_Chan#1 (channel:1 page:0xd87ff42 offset:0x480 grain:32 syndrome:0x0 - err_code:0x0000:0x009f SystemAddress:0xd87ff42480 ProcessorSocketId:0x1 MemoryControllerId:0x9 ChannelAddress:0x7ffe8480 ChannelId:0x1 RankAddress:0x3fff4240 PhysicalRankId:0x0 Row:0x7ffe Column:0x14 Bank:0x0 BankGroup:0x0 ChipSelect:0x2 ChipId:0x1 retry_rd_err_log[08928208 00000000 0000000000010000 00500081 80007ffe 000000d87ff42480] correrrcnt[0001 0000 0001 0000 0000 0000 0000 0000]) 
 
Link:https://gitee.com/openeuler/kernel/pulls/171

 
Reviewed-by: default avatarChen Wei <chenwei@xfusion.com>
Reviewed-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: default avatarJun Tian <jun.j.tian@intel.com>
Signed-off-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
parents 425c0a7b 6cc54473
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment