Commits · 54fb58aec47a49a80534d2ccabd319334d6b0ef8 · Mirrors / github.com / raspberrypi_linux

Oct 20, 2023

Merge tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · 54fb58ae

Linus Torvalds authored Oct 19, 2023

Pull slab fix from Vlastimil Babka:

 - stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64
   due to improperly resolved kmalloc alignment restrictions (Catalin
   Marinas)

* tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()

54fb58ae

Merge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 189b7562

Linus Torvalds authored Oct 19, 2023

Pull seccomp fix from Kees Cook:

 - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)

* tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  perf/benchmark: fix seccomp_unotify benchmark for 32-bit

189b7562

Merge tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ea1cc20c

Linus Torvalds authored Oct 19, 2023

Pull vfs fix from Christian Brauner:
 "An openat() call from io_uring triggering an audit call can apparently
  cause the refcount of struct filename to be incremented from multiple
  threads concurrently during async execution, triggering a refcount
  underflow and hitting a BUG_ON(). That bug has been lurking around
  since at least v5.16 apparently.

  Switch to an atomic counter to fix that. The underflow check is
  downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
  remove that check altogether tbh"

* tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  audit,io_uring: io_uring openat triggers audit reference count underflow

ea1cc20c

Merge tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3 · f69d00d1

Linus Torvalds authored Oct 19, 2023

Pull ntfs3 fixes from Konstantin Komarov:

 - memory leak

 - some logic errors, NULL dereferences

 - some code was refactored

 - more sanity checks

* tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: Avoid possible memory leak
  fs/ntfs3: Fix directory element type detection
  fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e()
  fs/ntfs3: Fix OOB read in ntfs_init_from_boot
  fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea()
  fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame()
  fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr()
  fs/ntfs3: Do not allow to change label if volume is read-only
  fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo
  fs/ntfs3: Refactoring and comments
  fs/ntfs3: Fix alternative boot searching
  fs/ntfs3: Allow repeated call to ntfs3_put_sbi
  fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime
  fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super
  fs/ntfs3: fix deadlock in mark_as_free_ex
  fs/ntfs3: Add more attributes checks in mi_enum_attr()
  fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)
  fs/ntfs3: Write immediately updated ntfs state
  fs/ntfs3: Add ckeck in ni_update_parent()

f69d00d1

Oct 19, 2023

Merge tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 7cf4bea7

Linus Torvalds authored Oct 19, 2023

Pull btrfs fix from David Sterba:
 "Fix a bug in chunk size decision that could lead to suboptimal
  placement and filling patterns"

* tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix stripe length calculation for non-zoned data chunk allocation

7cf4bea7

perf/benchmark: fix seccomp_unotify benchmark for 32-bit · 31c65705

Jiri Slaby (SUSE) authored Oct 17, 2023



Commit 7d5cb68a (perf/benchmark: add a new benchmark for
seccom_unotify) added a reference to __NR_seccomp into perf. This is
fine as it added also a definition of __NR_seccomp for 64-bit. But it
failed to do so for 32-bit as instead of ifndef, ifdef was used.

Fix this typo (so fix the build of perf on 32-bit).

Fixes: 7d5cb68a (perf/benchmark: add a new benchmark for seccom_unotify)
Cc: Andrei Vagin <avagin@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org


Signed-off-by: Kees Cook <keescook@chromium.org>

31c65705

Merge tag 'spi-fix-v6-6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · dd72f9c7

Linus Torvalds authored Oct 18, 2023

Pull spi fix from Mark Brown:
 "A fix for the npcm-fiu driver in cases where there are no dummy bytes
  during reads"

* tag 'spi-fix-v6-6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: npcm-fiu: Fix UMA reads when dummy.nbytes == 0

dd72f9c7

Merge tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · e1e80380

Linus Torvalds authored Oct 18, 2023

Pull regmap fix from Mark Brown:
 "A straightforward fix from Johan for a long standing bug in cases
  where we both have regmaps without devices and something is using
  dev_get_regmap()"

* tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: fix NULL deref on lookup

e1e80380

Oct 18, 2023

Merge tag 'fbdev-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · 06dc10ea

Linus Torvalds authored Oct 17, 2023

Pull fbdev fixes and cleanups from Helge Deller:
 "Various minor fixes, cleanups and annotations for atyfb, sa1100fb,
  omapfb, uvesafb and mmp"

* tag 'fbdev-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: core: syscopyarea: fix sloppy typing
  fbdev: core: cfbcopyarea: fix sloppy typing
  fbdev: uvesafb: Call cn_del_callback() at the end of uvesafb_exit()
  fbdev: uvesafb: Remove uvesafb_exec() prototype from include/video/uvesafb.h
  fbdev: sa1100fb: mark sa1100fb_init() static
  fbdev: omapfb: fix some error codes
  fbdev: atyfb: only use ioremap_uc() on i386 and ia64
  fbdev: mmp: Annotate struct mmp_path with __counted_by
  fbdev: mmp: Annotate struct mmphw_ctrl with __counted_by

06dc10ea

Oct 17, 2023

Merge tag 'probes-fixes-v6.6-rc6' of... · 213f8915

Linus Torvalds authored Oct 16, 2023

Merge tag 'probes-fixes-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - Fix fprobe document to add a new ret_ip parameter for callback
   functions. This has been introduced in v6.5 but the document was not
   updated.

 - Fix fprobe to check the number of active retprobes is not zero. This
   number is passed from parameter or calculated by the parameter and it
   can be zero which is not acceptable. But current code only check it
   is not minus.

* tag 'probes-fixes-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fprobe: Fix to ensure the number of active retprobes is not zero
  Documentation: probes: Add a new ret_ip callback parameter

213f8915

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 86d6a628

Linus Torvalds authored Oct 16, 2023

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix the handling of the phycal timer offset when FEAT_ECV and
     CNTPOFF_EL2 are implemented

   - Restore the functionnality of Permission Indirection that was
     broken by the Fine Grained Trapping rework

   - Cleanup some PMU event sharing code

  MIPS:

   - Fix W=1 build

  s390:

   - One small fix for gisa to avoid stalls

  x86:

   - Truncate writes to PMU counters to the counter's width to avoid
     spurious overflows when emulating counter events in software

   - Set the LVTPC entry mask bit when handling a PMI (to match
     Intel-defined architectural behavior)

   - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work
     to kick the guest out of emulated halt

   - Fix for loading XSAVE state from an old kernel into a new one

   - Fixes for AMD AVIC

  selftests:

   - Play nice with %llx when formatting guest printf and assert
     statements

   - Clean up stale test metadata

   - Zero-initialize structures in memslot perf test to workaround a
     suspected 'may be used uninitialized' false positives from GCC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2
  KVM: arm64: POR{E0}_EL1 do not need trap handlers
  KVM: arm64: Add nPIR{E0}_EL1 to HFG traps
  KVM: MIPS: fix -Wunused-but-set-variable warning
  KVM: arm64: pmu: Drop redundant check for non-NULL kvm_pmu_events
  KVM: SVM: Fix build error when using -Werror=unused-but-set-variable
  x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested()
  x86: KVM: SVM: add support for Invalid IPI Vector interception
  x86: KVM: SVM: always update the x2avic msr interception
  KVM: selftests: Force load all supported XSAVE state in state test
  KVM: selftests: Load XSAVE state into untouched vCPU during state test
  KVM: selftests: Touch relevant XSAVE state in guest for state test
  KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}
  x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer
  KVM: selftests: Zero-initialize entire test_result in memslot perf test
  KVM: selftests: Remove obsolete and incorrect test case metadata
  KVM: selftests: Treat %llx like %lx when formatting guest printf
  KVM: x86/pmu: Synthesize at most one PMI per VM-exit
  KVM: x86: Mask LVTPC when handling a PMI
  KVM: x86/pmu: Truncate counter value to allowed width on write
  ...

86d6a628

fprobe: Fix to ensure the number of active retprobes is not zero · 700b2b43

Masami Hiramatsu (Google) authored Oct 17, 2023

The number of active retprobes can be zero but it is not acceptable,
so return EINVAL error if detected.

Link: https://lore.kernel.org/all/169750018550.186853.11198884812017796410.stgit@devnote2/

Reported-by: wuqiang.matt <wuqiang.matt@bytedance.com>
Closes: https://lore.kernel.org/all/20231016222103.cb9f426edc60220eabd8aa6a@kernel.org/

Fixes: 5b0ab789 ("fprobe: Add exit_handler support")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

700b2b43

Documentation: probes: Add a new ret_ip callback parameter · 2a86ac30

Masami Hiramatsu (Google) authored Sep 24, 2023

Add a new ret_ip callback parameter description.

Link: https://lore.kernel.org/all/169556257133.146934.13560704846459957726.stgit@devnote2/

Fixes: cb16330d ("fprobe: Pass return address to the handlers")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Florent Revest <revest@chromium.org>

2a86ac30

fbdev: core: syscopyarea: fix sloppy typing · e8e4a470

Sergey Shtylyov authored Oct 13, 2023



In sys_copyarea(), the local variable bits_per_line is needlessly typed as
*unsigned long* -- which is a 32-bit type on the 32-bit arches and a 64-bit
type on the 64-bit arches; that variable's value is derived from the __u32
typed fb_fix_screeninfo::line_length field (multiplied by 8u) and a 32-bit
*unsigned int* type should still be enough to store the # of bits per line.

Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Helge Deller <deller@gmx.de>

e8e4a470

fbdev: core: cfbcopyarea: fix sloppy typing · 7f33df94

Sergey Shtylyov authored Oct 13, 2023



In cfb_copyarea(), the local variable bits_per_line is needlessly typed as
*unsigned long* -- which is a 32-bit type on the 32-bit arches and a 64-bit
type on the 64-bit arches; that variable's value is derived from the __u32
typed fb_fix_screeninfo::line_length field (multiplied by 8u) and a 32-bit
*unsigned int* type should still be enough to store the # of bits per line.

Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Helge Deller <deller@gmx.de>

7f33df94

fbdev: uvesafb: Call cn_del_callback() at the end of uvesafb_exit() · 1022e7e2

Jorge Maidana authored Oct 06, 2023



Delete the v86d netlink only after all the VBE tasks have been
completed.

Fixes initial state restore on module unload:
uvesafb: VBE state restore call failed (eax=0x4f04, err=-19)

Signed-off-by: Jorge Maidana <jorgem.linux@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>

1022e7e2

fbdev: uvesafb: Remove uvesafb_exec() prototype from include/video/uvesafb.h · 0c37bffa

Jorge Maidana authored Oct 06, 2023



uvesafb_exec() is a static function defined and called only in
drivers/video/fbdev/uvesafb.c, remove the prototype from
include/video/uvesafb.h.

Fixes the warning:
./include/video/uvesafb.h:112:12: warning: 'uvesafb_exec' declared 'static' but never defined [-Wunused-function]
when including '<video/uvesafb.h>' in an external program.

Signed-off-by: Jorge Maidana <jorgem.linux@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>

0c37bffa

fbdev: sa1100fb: mark sa1100fb_init() static · e638d371

Arnd Bergmann authored Oct 16, 2023



This is a global function that is only referenced as an initcall. This causes
a warning:

drivers/video/fbdev/sa1100fb.c:1218:12: error: no previous prototype for 'sa1100fb_init' [-Werror=missing-prototypes]

Make it static instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>

e638d371

fbdev: omapfb: fix some error codes · dc608db7

Dan Carpenter authored Oct 16, 2023



Return negative -ENXIO instead of positive ENXIO.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Helge Deller <deller@gmx.de>

dc608db7

Oct 16, 2023

Linux 6.6-rc6 · 58720809
Linus Torvalds authored Oct 15, 2023

58720809

Revert "x86/smp: Put CPUs into INIT on shutdown if possible" · fbe1bf1e

Linus Torvalds authored Oct 15, 2023

This reverts commit 45e34c8a, and the
two subsequent fixes to it:

  3f874c9b ("x86/smp: Don't send INIT to non-present and non-booted CPUs")
  b1472a60 ("x86/smp: Don't send INIT to boot CPU")

because it seems to result in hung machines at shutdown.  Particularly
some Dell machines, but Thomas says

 "The rest seems to be Lenovo and Sony with Alderlake/Raptorlake CPUs -
  at least that's what I could figure out from the various bug reports.

  I don't know which CPUs the DELL machines have, so I can't say it's a
  pattern.

  I agree with the revert for now"

Ashok Raj chimes in:

 "There was a report (probably this same one), and it turns out it was a
  bug in the BIOS SMI handler.

  The client BIOS's were waiting for the lowest APICID to be the SMI
  rendevous master. If this is MeteorLake, the BSP wasn't the one with
  the lowest APIC and it triped here.

  The BIOS change is also being pushed to others for assimilation :)

  Server BIOS's had this correctly for a while now"

and it does look likely to be some bad interaction between SMI and the
non-BSP cores having put into INIT (and thus unresponsive until reset).

Link: https://bbs.archlinux.org/viewtopic.php?pid=2124429
Link: https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/
Link: https://forum.artixlinux.org/index.php/topic,5997.0.html
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2241279


Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fbe1bf1e

virtio_net: fix the missing of the dma cpu sync · 5720c43d

Xuan Zhuo authored Sep 27, 2023

Commit 295525e2 ("virtio_net: merge dma operations when filling
mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when
the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we
do not want to do the sync for the entire page_frag. But that misses the
sync for the current area.

This patch does cpu sync regardless of whether the ref is zero or not.

Fixes: 295525e2 ("virtio_net: merge dma operations when filling mergeable buffers")
Reported-by: Michael Roth <michael.roth@amd.com>
Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs3ut@amd.com

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5720c43d

btrfs: fix stripe length calculation for non-zoned data chunk allocation · 8a540e99

Zygo Blaxell authored Oct 07, 2023

Commit f6fca391 "btrfs: store chunk size in space-info struct"
broke data chunk allocations on non-zoned multi-device filesystems when
using default chunk_size.  Commit 5da431b7 "btrfs: fix the max chunk
size and stripe length calculation" partially fixed that, and this patch
completes the fix for that case.

After commit f6fca391 and 5da431b7, the sequence of events for
a data chunk allocation on a non-zoned filesystem is:

        1.  btrfs_create_chunk calls init_alloc_chunk_ctl, which copies
        space_info->chunk_size (default 10 GiB) to ctl->max_stripe_len
        unmodified.  Before f6fca391, ctl->max_stripe_len value was
        1 GiB for non-zoned data chunks and not configurable.

        2.  btrfs_create_chunk calls gather_device_info which consumes
        and produces more fields of chunk_ctl.

        3.  gather_device_info multiplies ctl->max_stripe_len by
        ctl->dev_stripes (which is 1 in all cases except dup)
        and calls find_free_dev_extent with that number as num_bytes.

        4.  find_free_dev_extent locates the first dev_extent hole on
        a device which is at least as large as num_bytes.  With default
        max_chunk_size from f6fca391, it finds the first hole which is
        longer than 10 GiB, or the largest hole if that hole is shorter
        than 10 GiB.  This is different from the pre-f6fca391
        behavior, where num_bytes is 1 GiB, and find_free_dev_extent
        may choose a different hole.

        5.  gather_device_info repeats step 4 with all devices to find
        the first or largest dev_extent hole that can be allocated on
        each device.

        6.  gather_device_info sorts the device list by the hole size
        on each device, using total unallocated space on each device to
        break ties, then returns to btrfs_create_chunk with the list.

        7.  btrfs_create_chunk calls decide_stripe_size_regular.

        8.  decide_stripe_size_regular finds the largest stripe_len that
        fits across the first nr_devs device dev_extent holes that were
        found by gather_device_info (and satisfies other constraints
        on stripe_len that are not relevant here).

        9.  decide_stripe_size_regular caps the length of the stripe it
        computed at 1 GiB.  This cap appeared in 5da431b7 to correct
        one of the other regressions introduced in f6fca391.

        10.  btrfs_create_chunk creates a new chunk with the above
        computed size and number of devices.

At step 4, gather_device_info() has found a location where stripe up to
10 GiB in length could be allocated on several devices, and selected
which devices should have a dev_extent allocated on them, but at step
9, only 1 GiB of the space that was found on each device can be used.
This mismatch causes new suboptimal chunk allocation cases that did not
occur in pre-f6fca391 kernels.

Consider a filesystem using raid1 profile with 3 devices.  After some
balances, device 1 has 10x 1 GiB unallocated space, while devices 2
and 3 have 1x 10 GiB unallocated space, i.e. the same total amount of
space, but distributed across different numbers of dev_extent holes.
For visualization, let's ignore all the chunks that were allocated before
this point, and focus on the remaining holes:

        Device 1:  [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10x 1 GiB unallocated)
        Device 2:  [__________] (10 GiB contig unallocated)
        Device 3:  [__________] (10 GiB contig unallocated)

Before f6fca391, the allocator would fill these optimally by
allocating chunks with dev_extents on devices 1 and 2 ([12]), 1 and 3
([13]), or 2 and 3 ([23]):

        [after 0 chunk allocations]
        Device 1:  [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
        Device 2:  [__________] (10 GiB)
        Device 3:  [__________] (10 GiB)

        [after 1 chunk allocation]
        Device 1:  [12] [_] [_] [_] [_] [_] [_] [_] [_] [_]
        Device 2:  [12] [_________] (9 GiB)
        Device 3:  [__________] (10 GiB)

        [after 2 chunk allocations]
        Device 1:  [12] [13] [_] [_] [_] [_] [_] [_] [_] [_] (8 GiB)
        Device 2:  [12] [_________] (9 GiB)
        Device 3:  [13] [_________] (9 GiB)

        [after 3 chunk allocations]
        Device 1:  [12] [13] [12] [_] [_] [_] [_] [_] [_] [_] (7 GiB)
        Device 2:  [12] [12] [________] (8 GiB)
        Device 3:  [13] [_________] (9 GiB)

        [...]

        [after 12 chunk allocations]
        Device 1:  [12] [13] [12] [13] [12] [13] [12] [13] [_] [_] (2 GiB)
        Device 2:  [12] [12] [23] [23] [12] [12] [23] [23] [__] (2 GiB)
        Device 3:  [13] [13] [23] [23] [13] [23] [13] [23] [__] (2 GiB)

        [after 13 chunk allocations]
        Device 1:  [12] [13] [12] [13] [12] [13] [12] [13] [12] [_] (1 GiB)
        Device 2:  [12] [12] [23] [23] [12] [12] [23] [23] [12] [_] (1 GiB)
        Device 3:  [13] [13] [23] [23] [13] [23] [13] [23] [__] (2 GiB)

        [after 14 chunk allocations]
        Device 1:  [12] [13] [12] [13] [12] [13] [12] [13] [12] [13] (full)
        Device 2:  [12] [12] [23] [23] [12] [12] [23] [23] [12] [_] (1 GiB)
        Device 3:  [13] [13] [23] [23] [13] [23] [13] [23] [13] [_] (1 GiB)

        [after 15 chunk allocations]
        Device 1:  [12] [13] [12] [13] [12] [13] [12] [13] [12] [13] (full)
        Device 2:  [12] [12] [23] [23] [12] [12] [23] [23] [12] [23] (full)
        Device 3:  [13] [13] [23] [23] [13] [23] [13] [23] [13] [23] (full)

This allocates all of the space with no waste.  The sorting function used
by gather_device_info considers free space holes above 1 GiB in length
to be equal to 1 GiB, so once find_free_dev_extent locates a sufficiently
long hole on each device, all the holes appear equal in the sort, and the
comparison falls back to sorting devices by total free space.  This keeps
usable space on each device equal so they can all be filled completely.

After f6fca391, the allocator prefers the devices with larger holes
over the devices with more free space, so it makes bad allocation choices:

        [after 1 chunk allocation]
        Device 1:  [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
        Device 2:  [23] [_________] (9 GiB)
        Device 3:  [23] [_________] (9 GiB)

        [after 2 chunk allocations]
        Device 1:  [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
        Device 2:  [23] [23] [________] (8 GiB)
        Device 3:  [23] [23] [________] (8 GiB)

        [after 3 chunk allocations]
        Device 1:  [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
        Device 2:  [23] [23] [23] [_______] (7 GiB)
        Device 3:  [23] [23] [23] [_______] (7 GiB)

        [...]

        [after 9 chunk allocations]
        Device 1:  [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
        Device 2:  [23] [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)
        Device 3:  [23] [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)

        [after 10 chunk allocations]
        Device 1:  [12] [_] [_] [_] [_] [_] [_] [_] [_] [_] (9 GiB)
        Device 2:  [23] [23] [23] [23] [23] [23] [23] [23] [12] (full)
        Device 3:  [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)

        [after 11 chunk allocations]
        Device 1:  [12] [13] [_] [_] [_] [_] [_] [_] [_] [_] (8 GiB)
        Device 2:  [23] [23] [23] [23] [23] [23] [23] [23] [12] (full)
        Device 3:  [23] [23] [23] [23] [23] [23] [23] [23] [13] (full)

No further allocations are possible, with 8 GiB wasted (4 GiB of data
space).  The sort in gather_device_info now considers free space in
holes longer than 1 GiB to be distinct, so it will prefer devices 2 and
3 over device 1 until all but 1 GiB is allocated on devices 2 and 3.
At that point, with only 1 GiB unallocated on every device, the largest
hole length on each device is equal at 1 GiB, so the sort finally moves
to ordering the devices with the most free space, but by this time it
is too late to make use of the free space on device 1.

Note that it's possible to contrive a case where the pre-f6fca391
allocator fails the same way, but these cases generally have extensive
dev_extent fragmentation as a precondition (e.g. many holes of 768M
in length on one device, and few holes 1 GiB in length on the others).
With the regression in f6fca391, bad chunk allocation can occur even
under optimal conditions, when all dev_extent holes are exact multiples
of stripe_len in length, as in the example above.

Also note that post-f6fca391 kernels do treat dev_extent holes
larger than 10 GiB as equal, so the bad behavior won't show up on a
freshly formatted filesystem; however, as the filesystem ages and fills
up, and holes ranging from 1 GiB to 10 GiB in size appear, the problem
can show up as a failure to balance after adding or removing devices,
or an unexpected shortfall in available space due to unequal allocation.

To fix the regression and make data chunk allocation work
again, set ctl->max_stripe_len back to the original SZ_1G, or
space_info->chunk_size if that's smaller (the latter can happen if the
user set space_info->chunk_size to less than 1 GiB via sysfs, or it's
a 32 MiB system chunk with a hardcoded chunk_size and stripe_len).

While researching the background of the earlier commits, I found that an
identical fix was already proposed at:

  https://lore.kernel.org/linux-btrfs/de83ac46-a4a3-88d3-85ce-255b7abc5249@gmx.com/

The previous review missed one detail:  ctl->max_stripe_len is used
before decide_stripe_size_regular() is called, when it is too late for
the changes in that function to have any effect.  ctl->max_stripe_len is
not used directly by decide_stripe_size_regular(), but the parameter
does heavily influence the per-device free space data presented to
the function.

Fixes: f6fca391 ("btrfs: store chunk size in space-info struct")
CC: stable@vger.kernel.org # 6.1+
Link: https://lore.kernel.org/linux-btrfs/20231007051421.19657-1-ce3g8jdj@umail.furryterror.org/


Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: David Sterba <dsterba@suse.com>

8a540e99

Merge tag 'usb-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 11d3f726

Linus Torvalds authored Oct 15, 2023

Pull USB / Thunderbolt fixes from Greg KH:
 "Here are some USB and Thunderbolt driver fixes for 6.6-rc6 to resolve
  a number of small reported issues. Included in here are:

   - thunderbolt driver fixes

   - xhci driver fixes

   - cdns3 driver fixes

   - musb driver fixes

   - a number of typec driver fixes

   - a few other small driver fixes

  All of these have been in linux-next with no reported issues"

* tag 'usb-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
  usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope
  usb: typec: ucsi: Fix missing link removal
  usb: typec: altmodes/displayport: Signal hpd low when exiting mode
  xhci: Preserve RsvdP bits in ERSTBA register correctly
  xhci: Clear EHB bit only at end of interrupt handler
  xhci: track port suspend state correctly in unsuccessful resume cases
  usb: xhci: xhci-ring: Use sysdev for mapping bounce buffer
  usb: typec: ucsi: Clear EVENT_PENDING bit if ucsi_send_command fails
  usb: misc: onboard_hub: add support for Microchip USB2412 USB 2.0 hub
  usb: gadget: udc-xilinx: replace memcpy with memcpy_toio
  usb: cdns3: Modify the return value of cdns_set_active () to void when CONFIG_PM_SLEEP is disabled
  usb: dwc3: Soft reset phy on probe for host
  usb: hub: Guard against accesses to uninitialized BOS descriptors
  usb: typec: qcom: Update the logic of regulator enable and disable
  usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call
  usb: musb: Get the musb_qh poniter after musb_giveback
  usb: musb: Modify the "HWVers" register address
  usb: cdnsp: Fixes issue with dequeuing not queued requests
  thunderbolt: Restart XDomain discovery handshake after failure
  thunderbolt: Correct TMU mode initialization from hardware
  ...

11d3f726

Merge tag 'tty-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 41226a36

Linus Torvalds authored Oct 15, 2023

Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty/serial driver fixes for 6.6-rc6 that resolve
  some reported issues. Included in here are:

   - serial core pm runtime fix for issue reported by many

   - 8250_omap driver fix

   - rs485 spinlock fix for reported problem

   - ams-delta bugfix for previous tty api changes in -rc1 that missed
     this driver that never seems to get built in any test systems

  All of these have been in linux-next for over a week with no reported
  problems"

* tag 'tty-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  ASoC: ti: ams-delta: Fix cx81801_receive() argument types
  serial: core: Fix checks for tx runtime PM state
  serial: 8250_omap: Fix errors with no_console_suspend
  serial: Reduce spinlocked portion of uart_rs485_config()

41226a36

Merge tag 'char-misc-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · a477e3a7

Linus Torvalds authored Oct 15, 2023

Pull char/misc driver fixes from Greg KH:
 "Here is a small set of char/misc and other smaller driver subsystem
  fixes for 6.6-rc6. Included in here are:

   - lots of iio driver fixes

   - binder memory leak fix

   - mcb driver fixes

   - counter driver fixes

   - firmware loader documentation fix

   - documentation update for embargoed hardware issues

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'char-misc-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (22 commits)
  iio: pressure: ms5611: ms5611_prom_is_valid false negative bug
  dt-bindings: iio: adc: adi,ad7292: Fix additionalProperties on channel nodes
  iio: adc: ad7192: Correct reference voltage
  iio: light: vcnl4000: Don't power on/off chip in config
  iio: addac: Kconfig: update ad74413r selections
  iio: pressure: dps310: Adjust Timeout Settings
  iio: imu: bno055: Fix missing Kconfig dependencies
  iio: adc: imx8qxp: Fix address for command buffer registers
  iio: cros_ec: fix an use-after-free in cros_ec_sensors_push_data()
  iio: irsd200: fix -Warray-bounds bug in irsd200_trigger_handler
  dt-bindings: iio: rohm,bu27010: add missing vdd-supply to example
  binder: fix memory leaks of spam and pending work
  firmware_loader: Update contact emails for ABI docs
  Documentation: embargoed-hardware-issues.rst: Clarify prenotifaction
  mcb: remove is_added flag from mcb_device struct
  coresight: tmc-etr: Disable warnings for allocation failures
  coresight: Fix run time warnings while reusing ETR buffer
  iio: admv1013: add mixer_vgate corner cases
  iio: pressure: bmp280: Fix NULL pointer exception
  iio: dac: ad3552r: Correct device IDs
  ...

a477e3a7

Oct 15, 2023

Merge tag 'ovl-fixes-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs · 19fd4a91

Linus Torvalds authored Oct 15, 2023

Pull overlayfs fixes from Amir Goldstein:

 - Various fixes for regressions due to conversion to new mount
   api in v6.5

 - Disable a new mount option syntax (append lowerdir) that was
   added in v6.5 because we plan to add a different lowerdir
   append syntax in v6.7

* tag 'ovl-fixes-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: temporarily disable appending lowedirs
  ovl: fix regression in showing lowerdir mount option
  ovl: fix regression in parsing of mount options with escaped comma
  fs: factor out vfs_parse_monolithic_sep() helper

19fd4a91

Merge tag 'powerpc-6.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · f8bf101b

Linus Torvalds authored Oct 15, 2023

Pull powerpc fixes from Michael Ellerman:

 - Fix softlockup/crash when using hcall tracing

 - Fix pte_access_permitted() for PAGE_NONE on 8xx

 - Fix inverted pte_young() test in __ptep_test_and_clear_young()
   on 64-bit BookE

 - Fix unhandled math emulation exception on 85xx

 - Fix kernel crash on syscall return on 476

Thanks to Athira Rajeev, Christophe Leroy, Eddie James, and Naveen N
Rao.

* tag 'powerpc-6.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/47x: Fix 47x syscall return crash
  powerpc/85xx: Fix math emulation exception
  powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()
  powerpc/8xx: Fix pte_access_permitted() for PAGE_NONE
  powerpc/pseries: Remove unused r0 in the hcall tracing code
  powerpc/pseries: Fix STK_PARAM access in the hcall tracing code

f8bf101b

Merge tag 'smp-urgent-2023-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ddf20855

Linus Torvalds authored Oct 15, 2023

Pull CPU hotplug fix from Ingo Molnar:
 "Fix a Longsoon build warning by harmonizing the
  arch_[un]register_cpu() prototypes between architectures"

* tag 'smp-urgent-2023-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu-hotplug: Provide prototypes for arch CPU registration

ddf20855

Merge tag 'kvm-x86-selftests-6.6-fixes' of https://github.com/kvm-x86/linux into HEAD · 2b3f2325

Paolo Bonzini authored Oct 15, 2023

KVM selftests fixes for 6.6:

 - Play nice with %llx when formatting guest printf and assert statements.

 - Clean up stale test metadata.

 - Zero-initialize structures in memslot perf test to workaround a suspected
   "may be used uninitialized" false positives from GCC.

2b3f2325

Merge tag 'kvm-x86-pmu-6.6-fixes' of https://github.com/kvm-x86/linux into HEAD · 88e4cd89

Paolo Bonzini authored Oct 15, 2023

KVM x86/pmu fixes for 6.6:

 - Truncate writes to PMU counters to the counter's width to avoid spurious
   overflows when emulating counter events in software.

 - Set the LVTPC entry mask bit when handling a PMI (to match Intel-defined
   architectural behavior).

 - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work to
   kick the guest out of emulated halt.

88e4cd89

Merge tag 'kvmarm-fixes-6.6-2' of... · 24422df3

Paolo Bonzini authored Oct 15, 2023

Merge tag 'kvmarm-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.6, take #2

- Fix the handling of the phycal timer offset when FEAT_ECV
  and CNTPOFF_EL2 are implemented.

- Restore the functionnality of Permission Indirection that
  was broken by the Fine Grained Trapping rework

- Cleanup some PMU event sharing code

24422df3

Merge tag '6.6-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 9a3dad63

Linus Torvalds authored Oct 14, 2023

Pull smb server fixes from Steve French:

 - Fix for possible double free in RPC read

 - Add additional check to clarify smb2_open path and quiet Coverity

 - Fix incorrect error rsp in a compounding path

 - Fix to properly fail open of file with pending delete on close

* tag '6.6-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix potential double free on smb2_read_pipe() error path
  ksmbd: fix Null pointer dereferences in ksmbd_update_fstate()
  ksmbd: fix wrong error response status by using set_smb2_rsp_status()
  ksmbd: not allow to open file if delelete on close bit is set

9a3dad63

Merge tag '6.6-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · bf2069d1

Linus Torvalds authored Oct 14, 2023

Pull smb client fixes from Steve French:

 - fix caching race with open_cached_dir and laundromat cleanup of
   cached dirs (addresses a problem spotted with xfstest run with
   directory leases enabled)

 - reduce excessive resource usage of laundromat threads

* tag '6.6-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: prevent new fids from being removed by laundromat
  smb: client: make laundromat a delayed worker

bf2069d1

Merge tag 'x86-urgent-2023-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dc9b2e68

Linus Torvalds authored Oct 14, 2023

Pull x86 fixes from Ingo Molnar:
 "Fix a false-positive KASAN warning, fix an AMD erratum on Zen4 CPUs,
  and fix kernel-doc build warnings"

* tag 'x86-urgent-2023-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternatives: Disable KASAN in apply_alternatives()
  x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
  x86/resctrl: Fix kernel-doc warnings

dc9b2e68

Merge tag 'sched-urgent-2023-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 42578c7b

Linus Torvalds authored Oct 14, 2023

Pull scheduler fixes from Ingo Molnar:
 "Two EEVDF fixes"

* tag 'sched-urgent-2023-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/eevdf: Fix pick_eevdf()
  sched/eevdf: Fix min_deadline heap integrity

42578c7b

Merge tag 'perf-urgent-2023-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 23931d93

Linus Torvalds authored Oct 14, 2023

Pull x86 perf event fix from Ingo Molnar:
 "Fix an LBR sampling bug"

* tag 'perf-urgent-2023-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/lbr: Filter vsyscall addresses

23931d93

ovl: temporarily disable appending lowedirs · beae836e

Amir Goldstein authored Oct 14, 2023



Kernel v6.5 converted overlayfs to new mount api.
As an added bonus, it also added a feature to allow appending lowerdirs
using lowerdir=:/lower2,lowerdir=::/data3 syntax.

This new syntax has raised some concerns regarding escaping of colons.
We decided to try and disable this syntax, which hasn't been in the wild
for so long and introduce it again in 6.7 using explicit mount options
lowerdir+=/lower2,datadir+=/data3.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/r/CAJfpegsr3A4YgF2YBevWa6n3=AcP7hNndG6EPMu3ncvV-AM71A@mail.gmail.com/


Fixes: b36a5780 ("ovl: modify layer parameter parsing")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>

beae836e

Merge tag 'xfs-6.6-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 70f8c6f8

Linus Torvalds authored Oct 14, 2023

Pull xfs fixes from Chandan Babu:

 - Fix calculation of offset of AG's last block and its length

 - Update incore AG block count when shrinking an AG

 - Process free extents to busy list in FIFO order

 - Make XFS report its i_version as the STATX_CHANGE_COOKIE

* tag 'xfs-6.6-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: reinstate the old i_version counter as STATX_CHANGE_COOKIE
  xfs: Remove duplicate include
  xfs: correct calculation for agend and blockcount
  xfs: process free extents to busy list in FIFO order
  xfs: adjust the incore perag block_count when shrinking

70f8c6f8

Oct 14, 2023

ovl: fix regression in showing lowerdir mount option · 32db5107

Amir Goldstein authored Oct 11, 2023



Before commit b36a5780 ("ovl: modify layer parameter parsing"),
spaces and commas in lowerdir mount option value used to be escaped using
seq_show_option().

In current upstream, when lowerdir value has a space, it is not escaped
in /proc/mounts, e.g.:

  none /mnt overlay rw,relatime,lowerdir=l l,upperdir=u,workdir=w 0 0

which results in broken output of the mount utility:

  none on /mnt type overlay (rw,relatime,lowerdir=l)

Store the original lowerdir mount options before unescaping and show
them using the same escaping used for seq_show_option() in addition to
escaping the colon separator character.

Fixes: b36a5780 ("ovl: modify layer parameter parsing")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>

32db5107