anolis: net/smc: Resolve the race between SMC-R link access and clear
anolis inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I79GVV CVE: NA Reference: https://gitee.com/anolis/cloud-kernel/commit/22c9ca2d0f66980961a49698a2a223be4913e24a -------------------------------- ANBZ: #264 We encountered some crashes caused by the race between SMC-R link access and link clear triggered by link group termination in abnormal case, like port error. Here are some of panic stacks we met: 1) Race between smc_llc_flow_initiate() and smcr_link_clear() BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc] Call Trace: <TASK> ? __smc_buf_create+0x75a/0x950 [smc] smcr_lgr_reg_rmbs+0x2a/0xbf [smc] smc_listen_work+0xf72/0x1230 [smc] ? process_one_work+0x25c/0x600 process_one_work+0x25c/0x600 worker_thread+0x4f/0x3a0 ? process_one_work+0x600/0x600 kthread+0x15d/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> smc_listen_work() __smc_lgr_terminate() --------------------------------------------------------------- | smc_lgr_free() | |- smcr_link_clear() | |- memset(lnk, 0) smc_listen_rdma_reg() | |- smcr_lgr_reg_rmbs() | |- smc_llc_flow_initiate() | |- access lnk->lgr (panic) | 2) Race between smc_wr_tx_dismiss_slots() and smcr_link_clear() BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:_find_first_bit+0x8/0x50 Call Trace: <TASK> smc_wr_tx_dismiss_slots+0x34/0xc0 [smc] ? smc_cdc_tx_filter+0x10/0x10 [smc] smc_conn_free+0xd8/0x100 [smc] __smc_release+0xf1/0x140 [smc] smc_release+0x89/0x1b0 [smc] __sock_release+0x37/0xb0 sock_close+0x14/0x20 __fput+0xa9/0x260 task_work_run+0x6b/0xb0 do_exit+0x3ef/0xd40 do_group_exit+0x47/0xb0 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> smc_conn_free() __smc_lgr_terminate() ---------------------------------------------------------------- | smc_lgr_free() | |- smcr_link_clear() | |- smc_wr_free_link_mem() | |- lnk->wr_tx_mask = NULL; smc_wr_tx_dismiss_slots() | |- for_each_set_bit(link->wr_tx_mask) | |- (panic) | These crashes are caused by clearing SMC-R link resources when someone is still accessing to them. So this patch tries to fix it by introducing reference count of SMC-R links and ensuring that the sensitive resources of links are not cleared until reference count is zero. The operation to the SMC-R link reference count can be concluded as follows: object [hold or initialized as 1] [put] -------------------------------------------------------------------- links smcr_link_init() smcr_link_clear() connections smcr_lgr_conn_assign_link() smc_conn_free() Through this way, the clear of SMC-R links is later than the free of all the smc connections above it, thus avoiding the unsafe reference to SMC-R links. Signed-off-by:Wen Gu <guwen@linux.alibaba.com> Acked-by:
Tony Lu <tonylu@linux.alibaba.com> Signed-off-by:
Gengbiao Shen <shengengbiao@sangfor.com.cn>
Loading
Please sign in to comment