Commit d42bd17c authored by Darrick J. Wong's avatar Darrick J. Wong
Browse files

Merge tag 'large-folio-writes' of...

Merge tag 'large-folio-writes' of git://git.infradead.org/users/willy/pagecache into iomap-6.6-merge

Create large folios in iomap buffered write path

Commit ebb7fb15 limited the length of ioend chains to 4096 entries
to improve worst-case latency.  Unfortunately, this had the effect of
limiting the performance of:

fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 \
        -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 \
        -numjobs=4 -directory=/mnt/test

https://lore.kernel.org/linux-xfs/20230508172406.1CF3.409509F4@e16-tech.com/

The problem ends up being lock contention on the i_pages spinlock as we
clear the writeback bit on each folio (and propagate that up through
the tree).  By using larger folios, we decrease the number of folios
to be processed by a factor of 256 for this benchmark, eliminating the
lock contention.

Creating large folios in the buffered write path is also the right
thing to do.  It's a project that has been on the back burner for years,
it just hasn't been important enough to do before now.

* tag 'large-folio-writes' of git://git.infradead.org/users/willy/pagecache

:
  iomap: Copy larger chunks from userspace
  iomap: Create large folios in the buffered write path
  filemap: Allow __filemap_get_folio to allocate large folios
  filemap: Add fgf_t typedef
  iomap: Remove unnecessary test from iomap_release_folio()
  doc: Correct the description of ->release_folio
  iomap: Remove large folio handling in iomap_invalidate_folio()
  iov_iter: Add copy_folio_from_iter_atomic()
  iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic()
  iov_iter: Map the page later in copy_page_from_iter_atomic()

[djwong: yay amortizations!]
Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
parents 6eaae198 5d8edfb9
Loading
Loading
Loading
Loading
+11 −4
Original line number Diff line number Diff line
@@ -374,10 +374,17 @@ invalidate_lock before invalidating page cache in truncate / hole punch
path (and thus calling into ->invalidate_folio) to block races between page
cache invalidation and page cache filling functions (fault, read, ...).

->release_folio() is called when the kernel is about to try to drop the
buffers from the folio in preparation for freeing it.  It returns false to
indicate that the buffers are (or may be) freeable.  If ->release_folio is
NULL, the kernel assumes that the fs has no private interest in the buffers.
->release_folio() is called when the MM wants to make a change to the
folio that would invalidate the filesystem's private data.  For example,
it may be about to be removed from the address_space or split.  The folio
is locked and not under writeback.  It may be dirty.  The gfp parameter
is not usually used for allocation, but rather to indicate what the
filesystem may do to attempt to free the private data.  The filesystem may
return false to indicate that the folio's private data cannot be freed.
If it returns true, it should have already removed the private data from
the folio.  If a filesystem does not provide a ->release_folio method,
the pagecache will assume that private data is buffer_heads and call
try_to_free_buffers().

->free_folio() is called when the kernel has dropped the folio
from the page cache.
+3 −3
Original line number Diff line number Diff line
@@ -876,9 +876,9 @@ static int prepare_uptodate_page(struct inode *inode,
	return 0;
}

static unsigned int get_prepare_fgp_flags(bool nowait)
static fgf_t get_prepare_fgp_flags(bool nowait)
{
	unsigned int fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT;
	fgf_t fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT;

	if (nowait)
		fgp_flags |= FGP_NOWAIT;
@@ -910,7 +910,7 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
	int i;
	unsigned long index = pos >> PAGE_SHIFT;
	gfp_t mask = get_prepare_gfp_flags(inode, nowait);
	unsigned int fgp_flags = get_prepare_fgp_flags(nowait);
	fgf_t fgp_flags = get_prepare_fgp_flags(nowait);
	int err = 0;
	int faili;

+1 −1
Original line number Diff line number Diff line
@@ -1045,7 +1045,7 @@ static int prepare_compress_overwrite(struct compress_ctx *cc,
	struct address_space *mapping = cc->inode->i_mapping;
	struct page *page;
	sector_t last_block_in_bio;
	unsigned fgp_flag = FGP_LOCK | FGP_WRITE | FGP_CREAT;
	fgf_t fgp_flag = FGP_LOCK | FGP_WRITE | FGP_CREAT;
	pgoff_t start_idx = start_idx_of_cluster(cc);
	int i, ret;

+1 −1
Original line number Diff line number Diff line
@@ -2736,7 +2736,7 @@ static inline struct page *f2fs_grab_cache_page(struct address_space *mapping,

static inline struct page *f2fs_pagecache_get_page(
				struct address_space *mapping, pgoff_t index,
				int fgp_flags, gfp_t gfp_mask)
				fgf_t fgp_flags, gfp_t gfp_mask)
{
	if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_GET))
		return NULL;
+1 −1
Original line number Diff line number Diff line
@@ -971,7 +971,7 @@ gfs2_iomap_get_folio(struct iomap_iter *iter, loff_t pos, unsigned len)
	if (status)
		return ERR_PTR(status);

	folio = iomap_get_folio(iter, pos);
	folio = iomap_get_folio(iter, pos, len);
	if (IS_ERR(folio))
		gfs2_trans_end(sdp);
	return folio;
Loading