xfs: update the last_sync_lsn with ctx start lsn
hulk inclusion category: bugfix bugzilla: 189076, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA -------------------------------- While performing the io fault injection test, I caught the following data corruption report: XFS (dm-6): Internal error ltbno + ltlen > bno at line 1976 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x40b/0x930 [xfs] CPU: 7 PID: 184267 Comm: kworker/7:1 Kdump: loaded Tainted: G O 5.10.0-136.12.0.86.h1179.eulerosv2r12.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20211121_093514-szxrtosci10000 04/01/2014 Workqueue: xfs-inodegc/dm-6 xfs_inodegc_worker [xfs] Call Trace: dump_stack+0x57/0x6e xfs_corruption_error+0x81/0x90 [xfs] xfs_free_ag_extent+0x43c/0x930 [xfs] xfs_free_agfl_block+0x3b/0xd0 [xfs] xfs_agfl_free_finish_item+0x14c/0x160 [xfs] xfs_defer_finish_one+0xd5/0x220 [xfs] xfs_defer_finish_noroll+0xb5/0x210 [xfs] xfs_defer_finish+0x11/0x70 [xfs] xfs_itruncate_extents_flags+0xc1/0x240 [xfs] xfs_inactive_truncate+0xab/0xe0 [xfs] xfs_inactive+0x154/0x170 [xfs] xfs_inodegc_inactivate+0x16/0x50 [xfs] xfs_inodegc_worker+0xa0/0x110 [xfs] process_one_work+0x1b5/0x350 worker_thread+0x49/0x310 kthread+0xfe/0x140 ret_from_fork+0x22/0x30 XFS (dm-6): Corruption detected. Unmount and run xfs_repair After analyzing the disk image, I found that the cause of the problem was that the transactions were not replayed. The problem arises in that the iclog buffer IO completion updates the l_last_sync_lsn with it's own LSN. Transactions can be large enough to span many iclogs, only commit iclog goes to update l_last_sync_lsn. Since the last_sync_lsn update and the insertion of the item into the ail list releases the l_icloglock, if the ail is is empty in the meantime, the new iclog gets the last_sync_lsn as tail lsn. If the new iclog is written to disk and a shutdown occurs, the current iclog will not be able to replay in the next mount. xlog_state_done_syncing xlog_state_do_callback spin_lock(&log->l_icloglock); xlog_state_do_iclog_callbacks xlog_state_iodone_process_iclog xlog_state_set_callback last_sync_lsn = iclog->ic_header.h_lsn spin_unlock(&log->l_icloglock); ====>AIL is empty and get tail lsn for new iclog xlog_cil_process_committed xlog_cil_committed xfs_trans_committed_bulk(ctx->start_lsn) xfs_log_item_batch_insert(commit_lsn) xlog_state_clean_iclog(log, iclog) spin_lock(&log->l_icloglock); spin_unlock(&log->l_icloglock); Fix is simple, updates the l_last_sync_lsn with it's first ctx start lsn when commit iclog buffer IO completion. Even if the above happens, the iclog will be replayed as well. Signed-off-by:Long Li <leo.lilong@huawei.com>
Loading
Please sign in to comment