Commit a7ba1594 authored by Zizhi Wo's avatar Zizhi Wo
Browse files

xfs: Fix file creation failure

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9TDTA


CVE: NA

--------------------------------

In the file system expansion test and concurrent file creation and
writing scenarios, file creation fails occasionally.

The detailed test scheme is as follows:
1. If the remaining space is less than 128 MB, expand the space by 1 GB;
   --xfs_growfs /$DEV -D $bc -m 100
2. 32 processes create a file every 0.5s and write 4 KB to 4 MB data
randomly.
   --filesize=$((RANDOM % 1024 + 1))
   --dd if=/dev/zero oflag=direct of=$filename bs=4K count=$filesize
And when the file creation fails, there are still hundreds of megabytes of
free space. The overall analysis process is as follows:

	Direct write				Create file
xfs_file_write_iter
 ...
 xfs_direct_write_iomap_begin
  xfs_iomap_write_direct
   ...
   xfs_alloc_ag_vextent_near
    xfs_alloc_cur_finish
     xfs_alloc_fixup_trees
      xfs_btree_delete
       xfs_btree_delrec
	xfs_allocbt_update_lastrec
	// Longest = 0 because numrec == 0.
	 agf->agf_longest = len = 0
					   xfs_create
					    ...
					     xfs_dialloc
					      ...
					      xfs_alloc_fix_freelist
					       xfs_alloc_space_available
					-> as longest=0, it will return
					false, no space for inode alloc.

The root cause of the problem is that allocation extents holds agf locks,
but the inode creation process will quickly check whether there is space
firstly, which does not have agf locks. And when the first judgment fails,
it returns directly. If the first judgment passes, the lock is held before
entering the second judgment, that's how the "check-lock-check again"
algorithm is designed. If all AG fails in no lock check, an error will
return. This problem occurs probably when there is not enough space left
for all the AG's in front, and the last AG deletes the last CNT tree
record and the new record is not inserted yet.

Fix this issue by adding the bc_free_longest field to the xfs_btree_cur_t
structure to store the longest count that will be updated. The assignment
is done in xfs_alloc_fixup_trees() and xfs_free_ag_extent().

Signed-off-by: default avatarZizhi Wo <wozizhi@huawei.com>
parent c0b6b443
Loading
Loading
Loading
Loading
+14 −0
Original line number Diff line number Diff line
@@ -586,6 +586,13 @@ xfs_alloc_fixup_trees(
		nfbno2 = rbno + rlen;
		nflen2 = (fbno + flen) - nfbno2;
	}

	/*
	 * Record the potential maximum free length in advance.
	 */
	if (nfbno1 != NULLAGBLOCK || nfbno2 != NULLAGBLOCK)
		cnt_cur->bc_ag.bc_free_longest = XFS_EXTLEN_MAX(nflen1, nflen2);

	/*
	 * Delete the entry from the by-size btree.
	 */
@@ -2019,6 +2026,13 @@ xfs_free_ag_extent(
	 * Now allocate and initialize a cursor for the by-size tree.
	 */
	cnt_cur = xfs_allocbt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_CNT);
	/*
	 * Record the potential maximum free length in advance.
	 */
	if (haveleft)
		cnt_cur->bc_ag.bc_free_longest = ltlen;
	if (haveright)
		cnt_cur->bc_ag.bc_free_longest = gtlen;
	/*
	 * Have both left and right contiguous neighbors.
	 * Merge all three into a single free block.
+8 −1
Original line number Diff line number Diff line
@@ -146,7 +146,14 @@ xfs_allocbt_update_lastrec(
			rrp = XFS_ALLOC_REC_ADDR(cur->bc_mp, block, numrecs);
			len = rrp->ar_blockcount;
		} else {
			len = 0;
			/*
			 * Update in advance to prevent file creation failure
			 * for concurrent processes even though there is no
			 * numrec currently.
			 * And there's no need to worry as the value that no
			 * less than bc_free_longest will be inserted later.
			 */
			len = cpu_to_be32(cur->bc_ag.bc_free_longest);
		}

		break;
+1 −0
Original line number Diff line number Diff line
@@ -218,6 +218,7 @@ union xfs_btree_irec {
/* Per-AG btree information. */
struct xfs_btree_cur_ag {
	struct xfs_perag		*pag;
	xfs_extlen_t			bc_free_longest; /* the potential longest free extent */
	union {
		struct xfs_buf		*agbp;
		struct xbtree_afakeroot	*afake;	/* for staging cursor */