Skip to content
  1. Jun 22, 2021
    • Amir Goldstein's avatar
      fuse: fix illegal access to inode with reused nodeid · 15db1683
      Amir Goldstein authored
      
      
      Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
      with ourarg containing nodeid and generation.
      
      If a fuse inode is found in inode cache with the same nodeid but different
      generation, the existing fuse inode should be unhashed and marked "bad" and
      a new inode with the new generation should be hashed instead.
      
      This can happen, for example, with passhrough fuse filesystem that returns
      the real filesystem ino/generation on lookup and where real inode numbers
      can get recycled due to real files being unlinked not via the fuse
      passthrough filesystem.
      
      With current code, this situation will not be detected and an old fuse
      dentry that used to point to an older generation real inode, can be used to
      access a completely new inode, which should be accessed only via the new
      dentry.
      
      Note that because the FORGET message carries the nodeid w/o generation, the
      server should wait to get FORGET counts for the nlookup counts of the old
      and reused inodes combined, before it can free the resources associated to
      that nodeid.
      
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      15db1683
    • Richard W.M. Jones's avatar
      fuse: allow fallocate(FALLOC_FL_ZERO_RANGE) · 6b1bdb56
      Richard W.M. Jones authored
      The current fuse module filters out fallocate(FALLOC_FL_ZERO_RANGE)
      returning -EOPNOTSUPP.  libnbd's nbdfuse would like to translate
      FALLOC_FL_ZERO_RANGE requests into the NBD command
      NBD_CMD_WRITE_ZEROES which allows NBD servers that support it to do
      zeroing efficiently.
      
      This commit treats this flag exactly like FALLOC_FL_PUNCH_HOLE.
      
      A way to test this, requiring fuse >= 3, nbdkit >= 1.8 and the latest
      nbdfuse from https://gitlab.com/nbdkit/libnbd/-/tree/master/fuse is to
      create a file containing some data and "mirror" it to a fuse file:
      
        $ dd if=/dev/urandom of=disk.img bs=1M count=1
        $ nbdkit file disk.img
        $ touch mirror.img
        $ nbdfuse mirror.img nbd://localhost
      
       &
      
      (mirror.img -> nbdfuse -> NBD over loopback -> nbdkit -> disk.img)
      
      You can then run commands such as:
      
        $ fallocate -z -o 1024 -l 1024 mirror.img
      
      and check that the content of the original file ("disk.img") stays
      synchronized.  To show NBD commands, export LIBNBD_DEBUG=1 before
      running nbdfuse.  To clean up:
      
        $ fusermount3 -u mirror.img
        $ killall nbdkit
      
      Signed-off-by: default avatarRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      6b1bdb56
    • Greg Kurz's avatar
      fuse: Make fuse_fill_super_submount() static · 1b539917
      Greg Kurz authored
      
      
      This function used to be called from fuse_dentry_automount(). This code
      was moved to fuse_get_tree_submount() in the same file since then. It
      is unlikely there will ever be another user. No need to be extern in
      this case.
      
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      1b539917
    • Greg Kurz's avatar
      fuse: Switch to fc_mount() for submounts · 29e0e4df
      Greg Kurz authored
      
      
      fc_mount() already handles the vfs_get_tree(), sb->s_umount
      unlocking and vfs_create_mount() sequence. Using it greatly
      simplifies fuse_dentry_automount().
      
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      29e0e4df
    • Greg Kurz's avatar
      fuse: Call vfs_get_tree() for submounts · 266eb3f2
      Greg Kurz authored
      
      
      We recently fixed an infinite loop by setting the SB_BORN flag on
      submounts along with the write barrier needed by super_cache_count().
      This is the job of vfs_get_tree() and FUSE shouldn't have to care
      about the barrier at all.
      
      Split out some code from fuse_dentry_automount() to the dedicated
      fuse_get_tree_submount() handler for submounts and call vfs_get_tree().
      
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      266eb3f2
    • Greg Kurz's avatar
      fuse: add dedicated filesystem context ops for submounts · fe0a7bd8
      Greg Kurz authored
      
      
      The creation of a submount is open-coded in fuse_dentry_automount().
      This brings a lot of complexity and we recently had to fix bugs
      because we weren't setting SB_BORN or because we were unlocking
      sb->s_umount before sb was fully configured. Most of these could
      have been avoided by using the mount API instead of open-coding.
      
      Basically, this means coming up with a proper ->get_tree()
      implementation for submounts and call vfs_get_tree(), or better
      fc_mount().
      
      The creation of the superblock for submounts is quite different from
      the root mount. Especially, it doesn't require to allocate a FUSE
      filesystem context, nor to parse parameters.
      
      Introduce a dedicated context ops for submounts to make this clear.
      This is just a placeholder for now, fuse_get_tree_submount() will
      be populated in a subsequent patch.
      
      Only visible change is that we stop allocating/freeing a useless FUSE
      filesystem context with submounts.
      
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      fe0a7bd8
    • Greg Kurz's avatar
      virtiofs: propagate sync() to file server · 2d82ab25
      Greg Kurz authored
      
      
      Even if POSIX doesn't mandate it, linux users legitimately expect sync() to
      flush all data and metadata to physical storage when it is located on the
      same system.  This isn't happening with virtiofs though: sync() inside the
      guest returns right away even though data still needs to be flushed from
      the host page cache.
      
      This is easily demonstrated by doing the following in the guest:
      
      $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
      5120+0 records in
      5120+0 records out
      5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
      sync()                                  = 0 <0.024068>
      
      and start the following in the host when the 'dd' command completes
      in the guest:
      
      $ strace -T -e fsync /usr/bin/sync virtiofs/foo
      fsync(3)                                = 0 <10.371640>
      
      There are no good reasons not to honor the expected behavior of sync()
      actually: it gives an unrealistic impression that virtiofs is super fast
      and that data has safely landed on HW, which isn't the case obviously.
      
      Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS
      request type for this purpose.  Provision a 64-bit placeholder for possible
      future extensions.  Since the file server cannot handle the wait == 0 case,
      we skip it to avoid a gratuitous roundtrip.  Note that this is
      per-superblock: a FUSE_SYNCFS is send for the root mount and for each
      submount.
      
      Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in
      the file server is treated as permanent success.  This ensures
      compatibility with older file servers: the client will get the current
      behavior of sync() not being propagated to the file server.
      
      Note that such an operation allows the file server to DoS sync().  Since a
      typical FUSE file server is an untrusted piece of software running in
      userspace, this is disabled by default.  Only enable it with virtiofs for
      now since virtiofsd is supposedly trusted by the guest kernel.
      
      Reported-by: default avatarRobert Krawitz <rlk@redhat.com>
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2d82ab25
    • Miklos Szeredi's avatar
      fuse: reject internal errno · 49221cf8
      Miklos Szeredi authored
      
      
      Don't allow userspace to report errors that could be kernel-internal.
      
      Reported-by: default avatarAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Fixes: 334f485d
      
       ("[PATCH] FUSE - device functions")
      Cc: <stable@vger.kernel.org> # v2.6.14
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      49221cf8
    • Miklos Szeredi's avatar
      fuse: check connected before queueing on fpq->io · 80ef0867
      Miklos Szeredi authored
      
      
      A request could end up on the fpq->io list after fuse_abort_conn() has
      reset fpq->connected and aborted requests on that list:
      
      Thread-1			  Thread-2
      ========			  ========
      ->fuse_simple_request()           ->shutdown
        ->__fuse_request_send()
          ->queue_request()		->fuse_abort_conn()
      ->fuse_dev_do_read()                ->acquire(fpq->lock)
        ->wait_for(fpq->lock) 	  ->set err to all req's in fpq->io
      				  ->release(fpq->lock)
        ->acquire(fpq->lock)
        ->add req to fpq->io
      
      After the userspace copy is done the request will be ended, but
      req->out.h.error will remain uninitialized.  Also the copy might block
      despite being already aborted.
      
      Fix both issues by not allowing the request to be queued on the fpq->io
      list after fuse_abort_conn() has processed this list.
      
      Reported-by: default avatarPradeep P V K <pragalla@codeaurora.org>
      Fixes: fd22d62e
      
       ("fuse: no fc->lock for iqueue parts")
      Cc: <stable@vger.kernel.org> # v4.2
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      80ef0867
  2. Jun 19, 2021
    • Miklos Szeredi's avatar
      fuse: ignore PG_workingset after stealing · b89ecd60
      Miklos Szeredi authored
      
      
      Fix the "fuse: trying to steal weird page" warning.
      
      Description from Johannes Weiner:
      
        "Think of it as similar to PG_active. It's just another usage/heat
         indicator of file and anon pages on the reclaim LRU that, unlike
         PG_active, persists across deactivation and even reclaim (we store it in
         the page cache / swapper cache tree until the page refaults).
      
         So if fuse accepts pages that can legally have PG_active set,
         PG_workingset is fine too."
      
      Reported-by: default avatarThomas Lindroth <thomas.lindroth@gmail.com>
      Fixes: 1899ad18
      
       ("mm: workingset: tell cache transitions from workingset thrashing")
      Cc: <stable@vger.kernel.org> # v4.20
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      b89ecd60
  3. Jun 09, 2021
    • Greg Kurz's avatar
      fuse: Fix infinite loop in sget_fc() · e4a9ccdd
      Greg Kurz authored
      We don't set the SB_BORN flag on submounts. This is wrong as these
      superblocks are then considered as partially constructed or dying
      in the rest of the code and can break some assumptions.
      
      One such case is when you have a virtiofs filesystem with submounts
      and you try to mount it again : virtio_fs_get_tree() tries to obtain
      a superblock with sget_fc(). The logic in sget_fc() is to loop until
      it has either found an existing matching superblock with SB_BORN set
      or to create a brand new one. It is assumed that a superblock without
      SB_BORN is transient and the loop is restarted. Forgetting to set
      SB_BORN on submounts hence causes sget_fc() to retry forever.
      
      Setting SB_BORN requires special care, i.e. a write barrier for
      super_cache_count() which can check SB_BORN without taking any lock.
      We should call vfs_get_tree() to deal with that but this requires
      to have a proper ->get_tree() implementation for submounts, which
      is a bigger piece of work. Go for a simple bug fix in the meatime.
      
      Fixes: bf109c64
      
       ("fuse: implement crossmounts")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e4a9ccdd
    • Greg Kurz's avatar
      fuse: Fix crash if superblock of submount gets killed early · e3a43f2a
      Greg Kurz authored
      
      
      As soon as fuse_dentry_automount() does up_write(&sb->s_umount), the
      superblock can theoretically be killed. If this happens before the
      submount was added to the &fc->mounts list, fuse_mount_remove() later
      crashes in list_del_init() because it assumes the submount to be
      already there.
      
      Add the submount before dropping sb->s_umount to fix the inconsistency.
      It is okay to nest fc->killsb under sb->s_umount, we already do this
      on the ->kill_sb() path.
      
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Fixes: bf109c64
      
       ("fuse: implement crossmounts")
      Cc: stable@vger.kernel.org # v5.10+
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e3a43f2a
    • Greg Kurz's avatar
      fuse: Fix crash in fuse_dentry_automount() error path · d92d88f0
      Greg Kurz authored
      If fuse_fill_super_submount() returns an error, the error path
      triggers a crash:
      
      [   26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [...]
      [   26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90
      [...]
      [   26.247938] Call Trace:
      [   26.248300]  fuse_mount_remove+0x2c/0x70 [fuse]
      [   26.248892]  virtio_kill_sb+0x22/0x160 [virtiofs]
      [   26.249487]  deactivate_locked_super+0x36/0xa0
      [   26.250077]  fuse_dentry_automount+0x178/0x1a0 [fuse]
      
      The crash happens because fuse_mount_remove() assumes that the FUSE
      mount was already added to list under the FUSE connection, but this
      only done after fuse_fill_super_submount() has returned success.
      
      This means that until fuse_fill_super_submount() has returned success,
      the FUSE mount isn't actually owned by the superblock. We should thus
      reclaim ownership by clearing sb->s_fs_info, which will skip the call
      to fuse_mount_remove(), and perform rollback, like virtio_fs_get_tree()
      already does for the root sb.
      
      Fixes: bf109c64
      
       ("fuse: implement crossmounts")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      d92d88f0
  4. Jun 07, 2021
    • Linus Torvalds's avatar
      Linux 5.13-rc5 · 614124be
      Linus Torvalds authored
      614124be
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 90d56a3d
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Five small and fairly minor fixes, all in drivers"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: scsi_devinfo: Add blacklist entry for HPE OPEN-V
        scsi: ufs: ufs-mediatek: Fix HCI version in some platforms
        scsi: qedf: Do not put host in qedf_vport_create() unconditionally
        scsi: lpfc: Fix failure to transmit ABTS on FC link
        scsi: target: core: Fix warning on realtime kernels
      90d56a3d
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 20e41d9b
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Miscellaneous ext4 bug fixes"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
        ext4: fix no-key deletion for encrypt+casefold
        ext4: fix memory leak in ext4_fill_super
        ext4: fix fast commit alignment issues
        ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
        ext4: fix accessing uninit percpu counter variable with fast_commit
        ext4: fix memory leak in ext4_mb_init_backend on error path.
      20e41d9b
    • Linus Torvalds's avatar
      Merge tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · decad3e1
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A set of fixes that have been coming in over the last few weeks, the
        usual mix of fixes:
      
         - DT fixups for TI K3
      
         - SATA drive detection fix for TI DRA7
      
         - Power management fixes and a few build warning removals for OMAP
      
         - OP-TEE fix to use standard API for UUID exporting
      
         - DT fixes for a handful of i.MX boards
      
        And a few other smaller items"
      
      * tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
        arm64: meson: select COMMON_CLK
        soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
        ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
        bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
        ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
        ARM: dts: imx7d-pico: Fix the 'tuning-step' property
        ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
        arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
        arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
        ARM: imx: pm-imx27: Include "common.h"
        arm64: dts: zii-ultra: fix 12V_MAIN voltage
        arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
        arm64: dts: ls1028a: fix memory node
        bus: ti-sysc: Fix am335x resume hang for usb otg module
        ARM: OMAP2+: Fix build warning when mmc_omap is not built
        ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
        ARM: OMAP1: Fix use of possibly uninitialized irq variable
        optee: use export_uuid() to copy client UUID
        arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
        arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
        ...
      decad3e1
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · bd7b12aa
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fix our KVM reverse map real-mode handling since we enabled huge
        vmalloc (in some configurations).
      
        Revert a recent change to our IOMMU code which broke some devices.
      
        Fix KVM handling of FSCR on P7/P8, which could have possibly let a
        guest crash it's Qemu.
      
        Fix kprobes validation of prefixed instructions across page boundary.
      
        Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
        Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"
      
      * tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
        KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
        powerpc: Fix reverse map real-mode address lookup with huge vmalloc
        powerpc/kprobes: Fix validation of prefixed instructions across page boundary
      bd7b12aa
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 773ac53b
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "A bunch of x86/urgent stuff accumulated for the last two weeks so
        lemme unload it to you.
      
        It should be all totally risk-free, of course. :-)
      
         - Fix out-of-spec hardware (1st gen Hygon) which does not implement
           MSR_AMD64_SEV even though the spec clearly states so, and check
           CPUID bits first.
      
         - Send only one signal to a task when it is a SEGV_PKUERR si_code
           type.
      
         - Do away with all the wankery of reserving X amount of memory in the
           first megabyte to prevent BIOS corrupting it and simply and
           unconditionally reserve the whole first megabyte.
      
         - Make alternatives NOP optimization work at an arbitrary position
           within the patched sequence because the compiler can put
           single-byte NOPs for alignment anywhere in the sequence (32-bit
           retpoline), vs our previous assumption that the NOPs are only
           appended.
      
         - Force-disable ENQCMD[S] instructions support and remove
           update_pasid() because of insufficient protection against FPU state
           modification in an interrupt context, among other xstate horrors
           which are being addressed at the moment. This one limits the
           fallout until proper enablement.
      
         - Use cpu_feature_enabled() in the idxd driver so that it can be
           build-time disabled through the defines in disabled-features.h.
      
         - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
           LVT value is read before APIC initialization so that softlockups
           during boot do not happen at least on one machine.
      
         - Mark all legacy interrupts as legacy vectors when the IO-APIC is
           disabled and when all legacy interrupts are routed through the PIC"
      
      * tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sev: Check SME/SEV support in CPUID first
        x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
        x86/setup: Always reserve the first 1M of RAM
        x86/alternative: Optimize single-byte NOPs at an arbitrary position
        x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
        dmaengine: idxd: Use cpu_feature_enabled()
        x86/thermal: Fix LVT thermal setup for SMI delivery mode
        x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
      773ac53b
  5. Jun 06, 2021
  6. Jun 05, 2021