Commit de16588a authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual filesystems.

  Features:

   - Block mode changes on symlinks and rectify our broken semantics

   - Report file modifications via fsnotify() for splice

   - Allow specifying an explicit timeout for the "rootwait" kernel
     command line option. This allows to timeout and reboot instead of
     always waiting indefinitely for the root device to show up

   - Use synchronous fput for the close system call

  Cleanups:

   - Get rid of open-coded lockdep workarounds for async io submitters
     and replace it all with a single consolidated helper

   - Simplify epoll allocation helper

   - Convert simple_write_begin and simple_write_end to use a folio

   - Convert page_cache_pipe_buf_confirm() to use a folio

   - Simplify __range_close to avoid pointless locking

   - Disable per-cpu buffer head cache for isolated cpus

   - Port ecryptfs to kmap_local_page() api

   - Remove redundant initialization of pointer buf in pipe code

   - Unexport the d_genocide() function which is only used within core
     vfs

   - Replace printk(KERN_ERR) and WARN_ON() with WARN()

  Fixes:

   - Fix various kernel-doc issues

   - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE

   - Fix a mainly theoretical issue in devpts

   - Check the return value of __getblk() in reiserfs

   - Fix a racy assert in i_readcount_dec

   - Fix integer conversion issues in various functions

   - Fix LSM security context handling during automounts that prevented
     NFS superblock sharing"

* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
  cachefiles: use kiocb_{start,end}_write() helpers
  ovl: use kiocb_{start,end}_write() helpers
  aio: use kiocb_{start,end}_write() helpers
  io_uring: use kiocb_{start,end}_write() helpers
  fs: create kiocb_{start,end}_write() helpers
  fs: add kerneldoc to file_{start,end}_write() helpers
  io_uring: rename kiocb_end_write() local helper
  splice: Convert page_cache_pipe_buf_confirm() to use a folio
  libfs: Convert simple_write_begin and simple_write_end to use a folio
  fs/dcache: Replace printk and WARN_ON by WARN
  fs/pipe: remove redundant initialization of pointer buf
  fs: Fix kernel-doc warnings
  devpts: Fix kernel-doc warnings
  doc: idmappings: fix an error and rephrase a paragraph
  init: Add support for rootwait timeout parameter
  vfs: fix up the assert in i_readcount_dec
  fs: Fix one kernel-doc comment
  docs: filesystems: idmappings: clarify from where idmappings are taken
  fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
  vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
  ...
parents ecd7db20 e6fa4c72
Loading
Loading
Loading
Loading
+4 −0
Original line number Diff line number Diff line
@@ -5522,6 +5522,10 @@
			Useful for devices that are detected asynchronously
			(e.g. USB and MMC devices).

	rootwait=	[KNL] Maximum time (in seconds) to wait for root device
			to show up before attempting to mount the root
			filesystem.

	rproc_mem=nn[KMG][@address]
			[KNL,ARM,CMA] Remoteproc physical memory block.
			Memory area to be used by remote processor image,
+11 −3
Original line number Diff line number Diff line
@@ -146,9 +146,10 @@ For the rest of this document we will prefix all userspace ids with ``u`` and
all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
an idmapping will be written as ``u0:k10000:r10000``.

For example, the id ``u1000`` is an id in the upper idmapset or "userspace
idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a
kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``.
For example, within this idmapping, the id ``u1000`` is an id in the upper
idmapset or "userspace idmapset" starting with ``u0``. And it is mapped to
``k11000`` which is a kernel id in the lower idmapset or "kernel idmapset"
starting with ``k10000``.

A kernel id is always created by an idmapping. Such idmappings are associated
with user namespaces. Since we mainly care about how idmappings work we're not
@@ -373,6 +374,13 @@ kernel maps the caller's userspace id down into a kernel id according to the
caller's idmapping and then maps that kernel id up according to the
filesystem's idmapping.

From the implementation point it's worth mentioning how idmappings are represented.
All idmappings are taken from the corresponding user namespace.

    - caller's idmapping (usually taken from ``current_user_ns()``)
    - filesystem's idmapping (``sb->s_user_ns``)
    - mount's idmapping (``mnt_idmap(vfsmnt)``)

Let's see some examples with caller/filesystem idmapping but without mount
idmappings. This will exhibit some problems we can hit. After that we will
revisit/reconsider these examples, this time using mount idmappings, to see how
+3 −17
Original line number Diff line number Diff line
@@ -1447,13 +1447,8 @@ static void aio_complete_rw(struct kiocb *kiocb, long res)
	if (kiocb->ki_flags & IOCB_WRITE) {
		struct inode *inode = file_inode(kiocb->ki_filp);

		/*
		 * Tell lockdep we inherited freeze protection from submission
		 * thread.
		 */
		if (S_ISREG(inode->i_mode))
			__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
		file_end_write(kiocb->ki_filp);
			kiocb_end_write(kiocb);
	}

	iocb->ki_res.res = res;
@@ -1581,17 +1576,8 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
		return ret;
	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
	if (!ret) {
		/*
		 * Open-code file_start_write here to grab freeze protection,
		 * which will be released by another thread in
		 * aio_complete_rw().  Fool lockdep by telling it the lock got
		 * released so that it doesn't complain about the held lock when
		 * we return to userspace.
		 */
		if (S_ISREG(file_inode(file)->i_mode)) {
			sb_start_write(file_inode(file)->i_sb);
			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
		}
		if (S_ISREG(file_inode(file)->i_mode))
			kiocb_start_write(req);
		req->ki_flags |= IOCB_WRITE;
		aio_rw_done(req, call_write_iter(file, req, &iter));
	}
+18 −2
Original line number Diff line number Diff line
@@ -394,9 +394,25 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
		return error;

	if ((ia_valid & ATTR_MODE)) {
		umode_t amode = attr->ia_mode;
		/*
		 * Don't allow changing the mode of symlinks:
		 *
		 * (1) The vfs doesn't take the mode of symlinks into account
		 *     during permission checking.
		 * (2) This has never worked correctly. Most major filesystems
		 *     did return EOPNOTSUPP due to interactions with POSIX ACLs
		 *     but did still updated the mode of the symlink.
		 *     This inconsistency led system call wrapper providers such
		 *     as libc to block changing the mode of symlinks with
		 *     EOPNOTSUPP already.
		 * (3) To even do this in the first place one would have to use
		 *     specific file descriptors and quite some effort.
		 */
		if (S_ISLNK(inode->i_mode))
			return -EOPNOTSUPP;

		/* Flag setting protected by i_mutex */
		if (is_sxid(amode))
		if (is_sxid(attr->ia_mode))
			inode->i_flags &= ~S_NOSEC;
	}

+6 −1
Original line number Diff line number Diff line
@@ -49,6 +49,7 @@
#include <trace/events/block.h>
#include <linux/fscrypt.h>
#include <linux/fsverity.h>
#include <linux/sched/isolation.h>

#include "internal.h"

@@ -1352,7 +1353,7 @@ static void bh_lru_install(struct buffer_head *bh)
	 * failing page migration.
	 * Skip putting upcoming bh into bh_lru until migration is done.
	 */
	if (lru_cache_disabled()) {
	if (lru_cache_disabled() || cpu_is_isolated(smp_processor_id())) {
		bh_lru_unlock();
		return;
	}
@@ -1382,6 +1383,10 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)

	check_irqs_on();
	bh_lru_lock();
	if (cpu_is_isolated(smp_processor_id())) {
		bh_lru_unlock();
		return NULL;
	}
	for (i = 0; i < BH_LRU_SIZE; i++) {
		struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]);

Loading