Skip to content
  1. Feb 25, 2024
  2. Feb 14, 2024
  3. Feb 11, 2024
    • Kent Overstreet's avatar
    • Su Yue's avatar
      bcachefs: fix kmemleak in __bch2_read_super error handling path · 7dcfb87a
      Su Yue authored
      
      
      During xfstest tests, there are some kmemleak reports e.g. generic/051 with
      if USE_KMEMLEAK=yes:
      
      ====================================================================
      EXPERIMENTAL kmemleak reported some memory leaks!  Due to the way kmemleak
      works, the leak might be from an earlier test, or something totally unrelated.
      unreferenced object 0xffff9ef905aaf778 (size 8):
        comm "mount.bcachefs", pid 169844, jiffies 4295281209 (age 87.040s)
        hex dump (first 8 bytes):
          a5 cc cc cc cc cc cc cc                          ........
        backtrace:
          [<ffffffff87fd9a43>] __kmem_cache_alloc_node+0x1f3/0x2c0
          [<ffffffff87f49b66>] kmalloc_trace+0x26/0xb0
          [<ffffffffc0a3fefe>] __bch2_read_super+0xfe/0x4e0 [bcachefs]
          [<ffffffffc0a3ad22>] bch2_fs_open+0x262/0x1710 [bcachefs]
          [<ffffffffc09c9e24>] bch2_mount+0x4c4/0x640 [bcachefs]
          [<ffffffff88080c90>] legacy_get_tree+0x30/0x60
          [<ffffffff8802c748>] vfs_get_tree+0x28/0xf0
          [<ffffffff88061fe5>] path_mount+0x475/0xb60
          [<ffffffff880627e5>] __x64_sys_mount+0x105/0x140
          [<ffffffff88932642>] do_syscall_64+0x42/0xf0
          [<ffffffff88a000e6>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
      unreferenced object 0xffff9ef96cdc4fc0 (size 32):
        comm "mount.bcachefs", pid 169844, jiffies 4295281209 (age 87.040s)
        hex dump (first 32 bytes):
          2f 64 65 76 2f 6d 61 70 70 65 72 2f 74 65 73 74  /dev/mapper/test
          2d 31 00 cc cc cc cc cc cc cc cc cc cc cc cc cc  -1..............
        backtrace:
          [<ffffffff87fd9a43>] __kmem_cache_alloc_node+0x1f3/0x2c0
          [<ffffffff87f4a081>] __kmalloc_node_track_caller+0x51/0x150
          [<ffffffff87f3adc2>] kstrdup+0x32/0x60
          [<ffffffffc0a3ff1a>] __bch2_read_super+0x11a/0x4e0 [bcachefs]
          [<ffffffffc0a3ad22>] bch2_fs_open+0x262/0x1710 [bcachefs]
          [<ffffffffc09c9e24>] bch2_mount+0x4c4/0x640 [bcachefs]
          [<ffffffff88080c90>] legacy_get_tree+0x30/0x60
          [<ffffffff8802c748>] vfs_get_tree+0x28/0xf0
          [<ffffffff88061fe5>] path_mount+0x475/0xb60
          [<ffffffff880627e5>] __x64_sys_mount+0x105/0x140
          [<ffffffff88932642>] do_syscall_64+0x42/0xf0
          [<ffffffff88a000e6>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
      ====================================================================
      
      The leak happens if bdev_open_by_path() failed to open a block device
      then it goes label 'out' directly without call of bch2_free_super().
      
      Fix it by going to label 'err' instead of 'out' if bdev_open_by_path()
      fails.
      
      Signed-off-by: default avatarSu Yue <glass.su@suse.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      7dcfb87a
    • Kent Overstreet's avatar
      bcachefs: Fix missing bch2_err_class() calls · 1a1c93e7
      Kent Overstreet authored
      
      
      We aren't supposed to be leaking our private error codes outside of
      fs/bcachefs/.
      
      Fixes:
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      1a1c93e7
  4. Feb 05, 2024
  5. Jan 29, 2024
  6. Jan 26, 2024
    • Kent Overstreet's avatar
      bcachefs: __lookup_dirent() works in snapshot, not subvol · d2fda304
      Kent Overstreet authored
      
      
      Add a new helper, bch2_hash_lookup_in_snapshot(), for when we're not
      operating in a subvolume and already have a snapshot ID, and then use it
      in lookup_lostfound() -> __lookup_dirent().
      
      This is a bugfix - lookup_lostfound() doesn't take a subvolume ID, we
      were passing a nonsense subvolume ID before, and don't have one to pass
      since we may be operating in an interior snapshot node that doesn't have
      a subvolume ID.
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      d2fda304
  7. Jan 25, 2024
  8. Jan 23, 2024
  9. Jan 22, 2024
    • Linus Torvalds's avatar
      Linux 6.8-rc1 · 6613476e
      Linus Torvalds authored
      v6.8-rc1
      6613476e
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs · 35a4474b
      Linus Torvalds authored
      Pull more bcachefs updates from Kent Overstreet:
       "Some fixes, Some refactoring, some minor features:
      
         - Assorted prep work for disk space accounting rewrite
      
         - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
           makes our trigger context more explicit
      
         - A few fixes to avoid excessive transaction restarts on
           multithreaded workloads: fstests (in addition to ktest tests) are
           now checking slowpath counters, and that's shaking out a few bugs
      
         - Assorted tracepoint improvements
      
         - Starting to break up bcachefs_format.h and move on disk types so
           they're with the code they belong to; this will make room to start
           documenting the on disk format better.
      
         - A few minor fixes"
      
      * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
        bcachefs: Improve inode_to_text()
        bcachefs: logged_ops_format.h
        bcachefs: reflink_format.h
        bcachefs; extents_format.h
        bcachefs: ec_format.h
        bcachefs: subvolume_format.h
        bcachefs: snapshot_format.h
        bcachefs: alloc_background_format.h
        bcachefs: xattr_format.h
        bcachefs: dirent_format.h
        bcachefs: inode_format.h
        bcachefs; quota_format.h
        bcachefs: sb-counters_format.h
        bcachefs: counters.c -> sb-counters.c
        bcachefs: comment bch_subvolume
        bcachefs: bch_snapshot::btime
        bcachefs: add missing __GFP_NOWARN
        bcachefs: opts->compression can now also be applied in the background
        bcachefs: Prep work for variable size btree node buffers
        bcachefs: grab s_umount only if snapshotting
        ...
      35a4474b
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4fbbed78
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "Updates for time and clocksources:
      
         - A fix for the idle and iowait time accounting vs CPU hotplug.
      
           The time is reset on CPU hotplug which makes the accumulated
           systemwide time jump backwards.
      
         - Assorted fixes and improvements for clocksource/event drivers"
      
      * tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
        clocksource/drivers/ep93xx: Fix error handling during probe
        clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings
        clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings
        clocksource/timer-riscv: Add riscv_clock_shutdown callback
        dt-bindings: timer: Add StarFive JH8100 clint
        dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs
      4fbbed78
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 7b297a5c
      Linus Torvalds authored
      Pull powerpc fixes from Aneesh Kumar:
      
       - Increase default stack size to 32KB for Book3S
      
      Thanks to Michael Ellerman.
      
      * tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: Increase default stack size to 32KB
      7b297a5c
    • Kent Overstreet's avatar
      bcachefs: Improve inode_to_text() · 249f441f
      Kent Overstreet authored
      
      
      Add line breaks - inode_to_text() is now much easier to read.
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      249f441f
    • Kent Overstreet's avatar
      bcachefs: logged_ops_format.h · d826cc57
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      d826cc57
    • Kent Overstreet's avatar
      bcachefs: reflink_format.h · 8d52ba60
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      8d52ba60
    • Kent Overstreet's avatar
      bcachefs; extents_format.h · b2fa1b63
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      b2fa1b63
    • Kent Overstreet's avatar
      bcachefs: ec_format.h · 0560eb9a
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      0560eb9a
    • Kent Overstreet's avatar
      bcachefs: subvolume_format.h · c6c4ff65
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      c6c4ff65
    • Kent Overstreet's avatar
      bcachefs: snapshot_format.h · 8fed323b
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      8fed323b
    • Kent Overstreet's avatar
      d455179f
    • Kent Overstreet's avatar
      bcachefs: xattr_format.h · 72e08010
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      72e08010
    • Kent Overstreet's avatar
      bcachefs: dirent_format.h · 7ffc4daa
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      7ffc4daa
    • Kent Overstreet's avatar
      bcachefs: inode_format.h · b36425da
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      b36425da
    • Kent Overstreet's avatar
      bcachefs; quota_format.h · 82de6207
      Kent Overstreet authored
      
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      82de6207
    • Kent Overstreet's avatar
      bcachefs: sb-counters_format.h · 43314801
      Kent Overstreet authored
      
      
      bcachefs_format.h has gotten too big; let's do some organizing.
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      43314801
    • Kent Overstreet's avatar
      3a58dfbc
    • Kent Overstreet's avatar
      12207f49
    • Kent Overstreet's avatar
      bcachefs: bch_snapshot::btime · d32088f2
      Kent Overstreet authored
      
      
      Add a field to bch_snapshot for creation time; this will be important
      when we start exposing the snapshot tree to userspace.
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      d32088f2
    • Kent Overstreet's avatar
      7be0208f
    • Kent Overstreet's avatar
      bcachefs: opts->compression can now also be applied in the background · d7e77f53
      Kent Overstreet authored
      
      
      The "apply this compression method in the background" paths now use the
      compression option if background_compression is not set; this means that
      setting or changing the compression option will cause existing data to
      be compressed accordingly in the background.
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      d7e77f53
    • Kent Overstreet's avatar
      bcachefs: Prep work for variable size btree node buffers · ec4edd7b
      Kent Overstreet authored
      
      
      bcachefs btree nodes are big - typically 256k - and btree roots are
      pinned in memory. As we're now up to 18 btrees, we now have significant
      memory overhead in mostly empty btree roots.
      
      And in the future we're going to start enforcing that certain btree node
      boundaries exist, to solve lock contention issues - analagous to XFS's
      AGIs.
      
      Thus, we need to start allocating smaller btree node buffers when we
      can. This patch changes code that refers to the filesystem constant
      c->opts.btree_node_size to refer to the btree node buffer size -
      btree_buf_bytes() - where appropriate.
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      ec4edd7b
    • Su Yue's avatar
      bcachefs: grab s_umount only if snapshotting · 2acc59dd
      Su Yue authored
      
      
      When I was testing mongodb over bcachefs with compression,
      there is a lockdep warning when snapshotting mongodb data volume.
      
      $ cat test.sh
      prog=bcachefs
      
      $prog subvolume create /mnt/data
      $prog subvolume create /mnt/data/snapshots
      
      while true;do
          $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
          sleep 1s
      done
      
      $ cat /etc/mongodb.conf
      systemLog:
        destination: file
        logAppend: true
        path: /mnt/data/mongod.log
      
      storage:
        dbPath: /mnt/data/
      
      lockdep reports:
      [ 3437.452330] ======================================================
      [ 3437.452750] WARNING: possible circular locking dependency detected
      [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G            E
      [ 3437.453562] ------------------------------------------------------
      [ 3437.453981] bcachefs/35533 is trying to acquire lock:
      [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
      [ 3437.454875]
                     but task is already holding lock:
      [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
      [ 3437.456009]
                     which lock already depends on the new lock.
      
      [ 3437.456553]
                     the existing dependency chain (in reverse order) is:
      [ 3437.457054]
                     -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
      [ 3437.457507]        down_read+0x3e/0x170
      [ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
      [ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
      [ 3437.458498]        do_syscall_64+0x42/0xf0
      [ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [ 3437.459155]
                     -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
      [ 3437.459615]        down_read+0x3e/0x170
      [ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
      [ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
      [ 3437.460686]        notify_change+0x1f1/0x4a0
      [ 3437.461283]        do_truncate+0x7f/0xd0
      [ 3437.461555]        path_openat+0xa57/0xce0
      [ 3437.461836]        do_filp_open+0xb4/0x160
      [ 3437.462116]        do_sys_openat2+0x91/0xc0
      [ 3437.462402]        __x64_sys_openat+0x53/0xa0
      [ 3437.462701]        do_syscall_64+0x42/0xf0
      [ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [ 3437.463359]
                     -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
      [ 3437.463843]        down_write+0x3b/0xc0
      [ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
      [ 3437.464493]        vfs_write+0x21b/0x4c0
      [ 3437.464653]        ksys_write+0x69/0xf0
      [ 3437.464839]        do_syscall_64+0x42/0xf0
      [ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [ 3437.465231]
                     -> #0 (sb_writers#10){.+.+}-{0:0}:
      [ 3437.465471]        __lock_acquire+0x1455/0x21b0
      [ 3437.465656]        lock_acquire+0xc6/0x2b0
      [ 3437.465822]        mnt_want_write+0x46/0x1a0
      [ 3437.465996]        filename_create+0x62/0x190
      [ 3437.466175]        user_path_create+0x2d/0x50
      [ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
      [ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
      [ 3437.466791]        do_syscall_64+0x42/0xf0
      [ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [ 3437.467180]
                     other info that might help us debug this:
      
      [ 3437.469670] 2 locks held by bcachefs/35533:
                     other info that might help us debug this:
      
      [ 3437.467507] Chain exists of:
                       sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48
      
      [ 3437.467979]  Possible unsafe locking scenario:
      
      [ 3437.468223]        CPU0                    CPU1
      [ 3437.468405]        ----                    ----
      [ 3437.468585]   rlock(&type->s_umount_key#48);
      [ 3437.468758]                                lock(&c->snapshot_create_lock);
      [ 3437.469030]                                lock(&type->s_umount_key#48);
      [ 3437.469291]   rlock(sb_writers#10);
      [ 3437.469434]
                      *** DEADLOCK ***
      
      [ 3437.469670] 2 locks held by bcachefs/35533:
      [ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
      [ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
      [ 3437.470744]
                     stack backtrace:
      [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ #85
      [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
      [ 3437.471694] Call Trace:
      [ 3437.471795]  <TASK>
      [ 3437.471884]  dump_stack_lvl+0x57/0x90
      [ 3437.472035]  check_noncircular+0x132/0x150
      [ 3437.472202]  __lock_acquire+0x1455/0x21b0
      [ 3437.472369]  lock_acquire+0xc6/0x2b0
      [ 3437.472518]  ? filename_create+0x62/0x190
      [ 3437.472683]  ? lock_is_held_type+0x97/0x110
      [ 3437.472856]  mnt_want_write+0x46/0x1a0
      [ 3437.473025]  ? filename_create+0x62/0x190
      [ 3437.473204]  filename_create+0x62/0x190
      [ 3437.473380]  user_path_create+0x2d/0x50
      [ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
      [ 3437.473819]  ? lock_acquire+0xc6/0x2b0
      [ 3437.474002]  ? __fget_files+0x2a/0x190
      [ 3437.474195]  ? __fget_files+0xbc/0x190
      [ 3437.474380]  ? lock_release+0xc5/0x270
      [ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
      [ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
      [ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
      [ 3437.475277]  do_syscall_64+0x42/0xf0
      [ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [ 3437.475691] RIP: 0033:0x7f2743c313af
      ======================================================
      
      In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
      and unlock it at the end of the function. There is a comment
      "why do we need this lock?" about the lock coming from
      commit 42d23732 ("bcachefs: Snapshot creation, deletion")
      The reason is that __bch2_ioctl_subvolume_create() calls
      sync_inodes_sb() which enforce locked s_umount to writeback all dirty
      nodes before doing snapshot works.
      
      Fix it by read locking s_umount for snapshotting only and unlocking
      s_umount after sync_inodes_sb().
      
      Signed-off-by: default avatarSu Yue <glass.su@suse.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      2acc59dd