Commit a3a80cd0 authored by Zhang Yi's avatar Zhang Yi Committed by Zhihao Cheng
Browse files

xfs: reserve blocks for truncating large realtime inode

mainline inclusion
from mainline-v6.11-rc1
commit d048945150b798147b324f05f7e8c857772b0d3f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d048945150b798147b324f05f7e8c857772b0d3f



--------------------------------

When unaligned truncate down a big realtime file, xfs_truncate_page()
only zeros out the tail EOF block, __xfs_bunmapi() should split the tail
written extent and convert the later one that beyond EOF block to
unwritten, but it couldn't work as expected now since the reserved block
is zero in xfs_setattr_size(), this could expose stale data just after
commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
operation")'.

If we truncate file that contains a large enough written extent:

     |<    rxext    >|<    rtext    >|
  ...WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
        ^ (new EOF)      ^ old EOF

Since we only zeros out the tail of the EOF block, and
xfs_itruncate_extents()->..->__xfs_bunmapi() unmap the whole ailgned
extents, it becomes this state:

     |<    rxext    >|
  ...WWWzWWWWWWWWWWWWW
        ^ new EOF

Then if we do an extending write like this, the blocks in the previous
tail extent becomes stale:

     |<    rxext    >|
  ...WWWzSSSSSSSSSSSSS..........WWWWWWWWWWWWWWWWW
        ^ old EOF               ^ append start  ^ new EOF

Fix this by reserving XFS_DIOSTRAT_SPACE_RES blocks for big realtime
inode.

Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240618142112.1315279-2-yi.zhang@huaweicloud.com


Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
Conflicts:
	fs/xfs/xfs_iops.c
[ 3fed24fffc76dd("xfs: Replace xfs_isilocked with xfs_assert_ilocked")
  is not applied. ]
Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: default avatarZhihao Cheng <chengzhihao@huaweicloud.com>
parent 756b7269
Loading
Loading
Loading
Loading
+14 −1
Original line number Diff line number Diff line
@@ -17,6 +17,8 @@
#include "xfs_da_btree.h"
#include "xfs_attr.h"
#include "xfs_trans.h"
#include "xfs_trans_space.h"
#include "xfs_bmap_btree.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
#include "xfs_symlink.h"
@@ -794,6 +796,7 @@ xfs_setattr_size(
	struct xfs_trans	*tp;
	int			error;
	uint			lock_flags = 0;
	uint			resblks = 0;
	bool			did_zeroing = false;

	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
@@ -901,7 +904,17 @@ xfs_setattr_size(
			return error;
	}

	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
	/*
	 * For realtime inode with more than one block rtextsize, we need the
	 * block reservation for bmap btree block allocations/splits that can
	 * happen since it could split the tail written extent and convert the
	 * right beyond EOF one to unwritten.
	 */
	if (xfs_inode_has_bigrtalloc(ip))
		resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);

	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
				0, 0, &tp);
	if (error)
		return error;