Skip to content
  1. Jun 22, 2023
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.4-rc7' of... · 6e6fb54d
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fix from Mark Brown:
       "One simple fix for v6.4, some incorrectly specified bitfield masks in
        the PCA9450 driver"
      
      * tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
      6e6fb54d
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · e075d681
      Linus Torvalds authored
      Pull regmap fix from Mark Brown:
       "One more fix for v6.4
      
        The earlier fix to take account of the register data size when
        limiting raw register writes exposed the fact that the Intel AVMM bus
        was incorrectly specifying too low a limit on the maximum data
        transfer, it is only capable of transmitting one register so had set a
        transfer size limit that couldn't fit both the value and the the
        register address into a single message"
      
      * tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: spi-avmm: Fix regmap_bus max_raw_write
      e075d681
  2. Jun 21, 2023
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of... · 8ba90f5c
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull hotfixes from Andrew Morton:
       "19 hotfixes.  8 of these are cc:stable.
      
        This includes a wholesale reversion of the post-6.4 series 'make slab
        shrink lockless'. After input from Dave Chinner it has been decided
        that we should go a different way [1]"
      
      Link: https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area [1]
      
      * tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        selftests/mm: fix cross compilation with LLVM
        mailmap: add entries for Ben Dooks
        nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
        Revert "mm: vmscan: make global slab shrink lockless"
        Revert "mm: vmscan: make memcg slab shrink lockless"
        Revert "mm: vmscan: add shrinker_srcu_generation"
        Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"
        Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred"
        Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()"
        Revert "mm: shrinkers: convert shrinker_rwsem to mutex"
        nilfs2: fix buffer corruption due to concurrent device reads
        scripts/gdb: fix SB_* constants parsing
        scripts: fix the gfp flags header path in gfp-translate
        udmabuf: revert 'Add support for mapping hugepages (v4)'
        mm/khugepaged: fix iteration in collapse_file
        memfd: check for non-NULL file_seals in memfd_create() syscall
        mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
        mm/mprotect: fix do_mprotect_pkey() limit check
        writeback: fix dereferencing NULL mapping->host on writeback_page_template
      8ba90f5c
    • Linus Torvalds's avatar
      Merge tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e660abd5
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Fix a kernel crash during early resume from ACPI S3 that has been
        present since the 5.15 cycle when might_sleep() was added to
        down_timeout(), which in some configurations of the kernel caused an
        implicit preemption point to trigger at a wrong time"
      
      * tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()
      e660abd5
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · c74e2ac2
      Linus Torvalds authored
      Pull thermal control fix from Rafael Wysocki:
       "Fix a regression introduced during the 6.3 cycle causing
        intel_soc_dts_iosf to report incorrect temperature values
        due to a coding mistake (Hans de Goede)"
      
      * tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal/intel/intel_soc_dts_iosf: Fix reporting wrong temperatures
      c74e2ac2
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 2e30b973
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix MAINTAINERS file to point to proper mailing list for rtla and rv
      
         The mailing list pointed to linux-trace-devel instead of
         linux-trace-kernel. The former is for the tracing libraries and the
         latter is for anything in the Linux kernel tree. The wrong mailing
         list was used because linux-trace-kernel did not exist when rtla and
         rv were created.
      
       - User events:
      
          - Fix matching of dynamic events to their user events
      
            When user writes to dynamic_events file, a lookup of the
            registered dynamic events is made, but there were some cases that
            a match could be incorrectly made.
      
          - Add auto cleanup of user events
      
            Have the user events automatically get removed when the last
            reference (file descriptor) is closed. This was asked for to
            prevent leaks of user events hanging around needing admins to
            clean them up.
      
          - Add persistent logic (but not let user space use it yet)
      
            In some cases, having a persistent user event (one that does not
            get cleaned up automatically) is useful. But there's still debates
            about how to expose this to user space. The infrastructure is
            added, but the API is not.
      
          - Update the selftests
      
            Update the user event selftests to reflect the above changes"
      
      * tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/user_events: Document auto-cleanup and remove dyn_event refs
        selftests/user_events: Adapt dyn_test to non-persist events
        selftests/user_events: Ensure auto cleanup works as expected
        tracing/user_events: Add auto cleanup and future persist flag
        tracing/user_events: Track refcount consistently via put/get
        tracing/user_events: Store register flags on events
        tracing/user_events: Remove user_ns walk for groups
        selftests/user_events: Add perf self-test for empty arguments events
        selftests/user_events: Clear the events after perf self-test
        selftests/user_events: Add ftrace self-test for empty arguments events
        tracing/user_events: Fix the incorrect trace record for empty arguments events
        tracing: Modify print_fields() for fields output order
        tracing/user_events: Handle matching arguments that is null from dyn_events
        tracing/user_events: Prevent same name but different args event
        tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
      2e30b973
    • Linus Torvalds's avatar
      Merge tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 4b0c7a1b
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "One more regression fix for an assertion failure that uncovered a
        nasty problem with stripe calculations. This is caused by a u32
        overflow when there are enough devices. The fstests require 6 so this
        hasn't been caught, I was able to hit it with 8.
      
        The fix is minimal and only adds u64 casts, we'll clean that up later.
        I did various additional tests to be sure"
      
      * tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix u32 overflows when left shifting stripe_nr
      4b0c7a1b
    • Russ Weight's avatar
      regmap: spi-avmm: Fix regmap_bus max_raw_write · c8e79689
      Russ Weight authored
      The max_raw_write member of the regmap_spi_avmm_bus structure is defined
      as:
      	.max_raw_write = SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
      
      SPI_AVMM_VAL_SIZE == 4 and MAX_WRITE_CNT == 1 so this results in a
      maximum write transfer size of 4 bytes which provides only enough space to
      transfer the address of the target register. It provides no space for the
      value to be transferred. This bug became an issue (divide-by-zero in
      _regmap_raw_write()) after the following was accepted into mainline:
      
      commit 39815141 ("regmap: Account for register length when chunking")
      
      Change max_raw_write to include space (4 additional bytes) for both the
      register address and value:
      
      	.max_raw_write = SPI_AVMM_REG_SIZE + SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
      
      Fixes: 7f9fb673
      
       ("regmap: add Intel SPI Slave to AVMM Bus Bridge support")
      Reviewed-by: default avatarMatthew Gerlach <matthew.gerlach@linux.intel.com>
      Signed-off-by: default avatarRuss Weight <russell.h.weight@intel.com>
      Link: https://lore.kernel.org/r/20230620202824.380313-1-russell.h.weight@intel.com
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      c8e79689
    • Linus Torvalds's avatar
      Merge tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd · 99ec1ed7
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Four smb3 server fixes, all also for stable:
      
         - fix potential oops in parsing compounded requests
      
         - fix various paths (mkdir, create etc) where mnt_want_write was not
           checked first
      
         - fix slab out of bounds in check_message and write"
      
      * tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: validate session id and tree id in the compound request
        ksmbd: fix out-of-bound read in smb2_write
        ksmbd: add mnt_want_write to ksmbd vfs functions
        ksmbd: validate command payload size
      99ec1ed7
    • Qu Wenruo's avatar
      btrfs: fix u32 overflows when left shifting stripe_nr · a7299a18
      Qu Wenruo authored
      [BUG]
      David reported an ASSERT() get triggered during fio load on 8 devices
      with data/raid6 and metadata/raid1c3:
      
        fio --rw=randrw --randrepeat=1 --size=3000m \
      	  --bsrange=512b-64k --bs_unaligned \
      	  --ioengine=libaio --fsync=1024 \
      	  --name=job0 --name=job1 \
      
      The ASSERT() is from rbio_add_bio() of raid56.c:
      
      	ASSERT(orig_logical >= full_stripe_start &&
      	       orig_logical + orig_len <= full_stripe_start +
      	       rbio->nr_data * BTRFS_STRIPE_LEN);
      
      Which is checking if the target rbio is crossing the full stripe
      boundary.
      
        [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622
        [100.795] ------------[ cut here ]------------
        [100.796] kernel BUG at fs/btrfs/raid56.c:1622!
        [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
        [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124
        [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
        [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1)
        [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs]
        [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246
        [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01
        [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
        [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f
        [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400
        [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000
        [100.817] FS:  0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
        [100.821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0
        [100.824] Call Trace:
        [100.825]  <TASK>
        [100.825]  ? die+0x32/0x80
        [100.826]  ? do_trap+0x12d/0x160
        [100.827]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.827]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.829]  ? do_error_trap+0x90/0x130
        [100.830]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.831]  ? handle_invalid_op+0x2c/0x30
        [100.833]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.835]  ? exc_invalid_op+0x29/0x40
        [100.836]  ? asm_exc_invalid_op+0x16/0x20
        [100.837]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.837]  raid56_parity_write+0x64/0x270 [btrfs]
        [100.838]  btrfs_submit_chunk+0x26e/0x800 [btrfs]
        [100.840]  ? btrfs_bio_init+0x80/0x80 [btrfs]
        [100.841]  ? release_pages+0x503/0x6d0
        [100.842]  ? folio_unlock+0x2f/0x60
        [100.844]  ? __folio_put+0x60/0x60
        [100.845]  ? btrfs_do_readpage+0xae0/0xae0 [btrfs]
        [100.847]  btrfs_submit_bio+0x21/0x60 [btrfs]
        [100.847]  submit_one_bio+0x6a/0xb0 [btrfs]
        [100.849]  extent_write_cache_pages+0x395/0x680 [btrfs]
        [100.850]  ? __extent_writepage+0x520/0x520 [btrfs]
        [100.851]  ? mark_usage+0x190/0x190
        [100.852]  extent_writepages+0xdb/0x130 [btrfs]
        [100.853]  ? extent_write_locked_range+0x480/0x480 [btrfs]
        [100.854]  ? mark_usage+0x190/0x190
        [100.854]  ? attach_extent_buffer_page+0x220/0x220 [btrfs]
        [100.855]  ? reacquire_held_locks+0x178/0x280
        [100.856]  ? writeback_sb_inodes+0x245/0x7f0
        [100.857]  do_writepages+0x102/0x2e0
        [100.858]  ? page_writeback_cpu_online+0x10/0x10
        [100.859]  ? __lock_release.isra.0+0x14a/0x4d0
        [100.860]  ? reacquire_held_locks+0x280/0x280
        [100.861]  ? __lock_acquired+0x1e9/0x3d0
        [100.862]  ? do_raw_spin_lock+0x1b0/0x1b0
        [100.863]  __writeback_single_inode+0x94/0x450
        [100.864]  writeback_sb_inodes+0x372/0x7f0
        [100.864]  ? lock_sync+0xd0/0xd0
        [100.865]  ? do_raw_spin_unlock+0x93/0xf0
        [100.866]  ? sync_inode_metadata+0xc0/0xc0
        [100.867]  ? rwsem_optimistic_spin+0x340/0x340
        [100.868]  __writeback_inodes_wb+0x70/0x130
        [100.869]  wb_writeback+0x2d1/0x530
        [100.869]  ? __writeback_inodes_wb+0x130/0x130
        [100.870]  ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0
        [100.870]  wb_do_writeback+0x3eb/0x480
        [100.871]  ? wb_writeback+0x530/0x530
        [100.871]  ? mark_lock_irq+0xcd0/0xcd0
        [100.872]  wb_workfn+0xe0/0x3f0<
      
      [CAUSE]
      Commit a97699d1
      
       ("btrfs: replace map_lookup->stripe_len by
      BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
      u64 division.
      
      Function btrfs_max_io_len() is to get the length to the stripe boundary.
      
      It calculates the full stripe start offset (inside the chunk) by the
      following code:
      
      		*full_stripe_start =
      			rounddown(*stripe_nr, nr_data_stripes(map)) <<
      			BTRFS_STRIPE_LEN_SHIFT;
      
      The calculation itself is fine, but the value returned by rounddown() is
      dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
      returned int).
      
      Thus the result is also u32, then we do the left shift, which can
      overflow u32.
      
      If such overflow happens, @full_stripe_start will be a value way smaller
      than @offset, causing later "full_stripe_len - (offset -
      *full_stripe_start)" to underflow, thus make later length calculation to
      have no stripe boundary limit, resulting a write bio to exceed stripe
      boundary.
      
      There are some other locations like this, with a u32 @stripe_nr got left
      shift, which can lead to a similar overflow.
      
      [FIX]
      Fix all @stripe_nr with left shift with a type cast to u64 before the
      left shift.
      
      Those involved @stripe_nr or similar variables are recording the stripe
      number inside the chunk, which is small enough to be contained by u32,
      but their offset inside the chunk can not fit into u32.
      
      Thus for those specific left shifts, a type cast to u64 is necessary so
      this patch does not touch them and the code will be cleaned up in the
      future to keep the fix minimal.
      
      Reported-by: default avatarDavid Sterba <dsterba@suse.com>
      Fixes: a97699d1
      
       ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
      Tested-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a7299a18
  3. Jun 20, 2023
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed-20230619' of... · 692b7dc8
      Linus Torvalds authored
      Merge tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv fixes from Wei Liu:
      
       - Fix races in Hyper-V PCI controller (Dexuan Cui)
      
       - Fix handling of hyperv_pcpu_input_arg (Michael Kelley)
      
       - Fix vmbus_wait_for_unload to scan present CPUs (Michael Kelley)
      
       - Call hv_synic_free in the failure path of hv_synic_alloc (Dexuan Cui)
      
       - Add noop for real mode handlers for virtual trust level code (Saurabh
         Sengar)
      
      * tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        PCI: hv: Add a per-bus mutex state_lock
        Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
        PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
        PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
        PCI: hv: Fix a race condition bug in hv_pci_query_relations()
        arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencing
        x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline
        Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs
        Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails
        x86/hyperv/vtl: Add noop for realmode pointers
      692b7dc8
    • Mark Brown's avatar
      selftests/mm: fix cross compilation with LLVM · 0518dbe9
      Mark Brown authored
      
      
      Currently the MM selftests attempt to work out the target architecture by
      using CROSS_COMPILE or otherwise querying the host machine, storing the
      target architecture in a variable called MACHINE rather than the usual
      ARCH though as far as I can tell (including for x86_64) the value is the
      same as we would use for architecture.
      
      When cross compiling with LLVM we don't need a CROSS_COMPILE as LLVM can
      support many target architectures in a single build so this logic does not
      work, CROSS_COMPILE is not set and we end up selecting tests for the host
      rather than target architecture.  Fix this by using the more standard ARCH
      to describe the architecture, taking it from the environment if specified.
      
      Link: https://lkml.kernel.org/r/20230614-kselftest-mm-llvm-v1-1-180523f277d3@kernel.org
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tom Rix <trix@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0518dbe9
    • Ben Dooks's avatar
      mailmap: add entries for Ben Dooks · 823b37e8
      Ben Dooks authored
      
      
      I am going to be losing my sifive.com address soon and I also realised my
      old Simtec address (from >10 years ago) is also not been updates so update
      .mailmap for both.
      
      Link: https://lkml.kernel.org/r/20230615081820.79485-1-ben.dooks@codethink.co.uk
      Signed-off-by: default avatarBen Dooks <ben.dooks@sifive.com>
      Signed-off-by: default avatarBen Dooks <ben-linux@fluff.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      823b37e8
    • Ryusuke Konishi's avatar
      nilfs2: prevent general protection fault in nilfs_clear_dirty_page() · 782e53d0
      Ryusuke Konishi authored
      
      
      In a syzbot stress test that deliberately causes file system errors on
      nilfs2 with a corrupted disk image, it has been reported that
      nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
      general protection fault.
      
      In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
      cache and calling nilfs_clear_dirty_page() for each dirty page/folio
      retrieved, the back reference from the argument page to "mapping" may have
      been changed to NULL (and possibly others).  It is necessary to check this
      after locking the page/folio.
      
      So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
      after locking it in nilfs_clear_dirty_pages() if the back reference
      "mapping" from the page/folio is different from the "mapping" that held
      the page/folio just before.
      
      Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar <syzbot+53369d11851d8f26735c@syzkaller.appspotmail.com>
      Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      782e53d0
    • Qi Zheng's avatar
      Revert "mm: vmscan: make global slab shrink lockless" · 71c3ad65
      Qi Zheng authored
      This reverts commit f95bdb70.
      
      Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
      test case [1], which is caused by commit f95bdb70
      
       ("mm: vmscan: make
      global slab shrink lockless").  The root cause is that SRCU has to be
      careful to not frequently check for SRCU read-side critical section exits.
      Therefore, even if no one is currently in the SRCU read-side critical
      section, synchronize_srcu() cannot return quickly.  That's why
      unregister_shrinker() has become slower.
      
      After discussion, we will try to use the refcount+RCU method [2] proposed
      by Dave Chinner to continue to re-implement the lockless slab shrink.  So
      revert the shrinker_srcu related changes first.
      
      [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
      [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/
      
      Link: https://lkml.kernel.org/r/20230609081518.3039120-8-qi.zheng@linux.dev
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      71c3ad65
    • Qi Zheng's avatar
      Revert "mm: vmscan: make memcg slab shrink lockless" · 7cee3603
      Qi Zheng authored
      This reverts commit caa05325.
      
      Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
      test case [1], which is caused by commit f95bdb70
      
       ("mm: vmscan: make
      global slab shrink lockless").  The root cause is that SRCU has to be
      careful to not frequently check for SRCU read-side critical section exits.
      Therefore, even if no one is currently in the SRCU read-side critical
      section, synchronize_srcu() cannot return quickly.  That's why
      unregister_shrinker() has become slower.
      
      After discussion, we will try to use the refcount+RCU method [2] proposed
      by Dave Chinner to continue to re-implement the lockless slab shrink.  So
      revert the shrinker_srcu related changes first.
      
      [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
      [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/
      
      Link: https://lkml.kernel.org/r/20230609081518.3039120-7-qi.zheng@linux.dev
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7cee3603
    • Qi Zheng's avatar
      Revert "mm: vmscan: add shrinker_srcu_generation" · d6ecbcd7
      Qi Zheng authored
      This reverts commit 475733dd.
      
      Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
      test case [1], which is caused by commit f95bdb70
      
       ("mm: vmscan: make
      global slab shrink lockless").  The root cause is that SRCU has to be
      careful to not frequently check for SRCU read-side critical section exits.
      Therefore, even if no one is currently in the SRCU read-side critical
      section, synchronize_srcu() cannot return quickly.  That's why
      unregister_shrinker() has become slower.
      
      We will try to use the refcount+RCU method [2] proposed by Dave Chinner to
      continue to re-implement the lockless slab shrink.  So revert the
      shrinker_srcu related changes first.
      
      [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
      [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/
      
      Link: https://lkml.kernel.org/r/20230609081518.3039120-6-qi.zheng@linux.dev
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d6ecbcd7
    • Qi Zheng's avatar
      Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless" · 1a554ecc
      Qi Zheng authored
      This reverts commit 20cd1892.
      
      Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
      test case [1], which is caused by commit f95bdb70
      
       ("mm: vmscan: make
      global slab shrink lockless").  The root cause is that SRCU has to be
      careful to not frequently check for SRCU read-side critical section exits.
      Therefore, even if no one is currently in the SRCU read-side critical
      section, synchronize_srcu() cannot return quickly.  That's why
      unregister_shrinker() has become slower.
      
      We will try to use the refcount+RCU method [2] proposed by Dave Chinner to
      continue to re-implement the lockless slab shrink.  So revert the
      shrinker_srcu related changes first.
      
      [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
      [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/
      
      Link: https://lkml.kernel.org/r/20230609081518.3039120-5-qi.zheng@linux.dev
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1a554ecc
    • Qi Zheng's avatar
      Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred" · c534f7cc
      Qi Zheng authored
      This reverts commit b3cabea3.
      
      Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
      test case [1], which is caused by commit f95bdb70
      
       ("mm: vmscan: make
      global slab shrink lockless"). The root cause is that SRCU has to be careful
      to not frequently check for SRCU read-side critical section exits. Therefore,
      even if no one is currently in the SRCU read-side critical section,
      synchronize_srcu() cannot return quickly. That's why unregister_shrinker()
      has become slower.
      
      We will try to use the refcount+RCU method [2] proposed by Dave Chinner
      to continue to re-implement the lockless slab shrink. Because there will
      be other readers after reverting the shrinker_srcu related changes, so
      it is better to restore to hold read lock to reparent shrinker nr_deferred.
      
      [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
      [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/
      
      Link: https://lkml.kernel.org/r/20230609081518.3039120-4-qi.zheng@linux.dev
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c534f7cc
    • Qi Zheng's avatar
      Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()" · 07252b0f
      Qi Zheng authored
      This reverts commit 1643db98.
      
      Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
      test case [1], which is caused by commit f95bdb70
      
       ("mm: vmscan: make
      global slab shrink lockless").  The root cause is that SRCU has to be
      careful to not frequently check for SRCU read-side critical section exits.
      Therefore, even if no one is currently in the SRCU read-side critical
      section, synchronize_srcu() cannot return quickly.  That's why
      unregister_shrinker() has become slower.
      
      We will try to use the refcount+RCU method [2] proposed by Dave Chinner to
      continue to re-implement the lockless slab shrink.  So we still need
      shrinker_rwsem in synchronize_shrinkers() after reverting the
      shrinker_srcu related changes.
      
      [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
      [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/
      
      Link: https://lkml.kernel.org/r/20230609081518.3039120-3-qi.zheng@linux.dev
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      07252b0f
    • Qi Zheng's avatar
      Revert "mm: shrinkers: convert shrinker_rwsem to mutex" · 47a7c01c
      Qi Zheng authored
      Patch series "revert shrinker_srcu related changes".
      
      
      This patch (of 7):
      
      This reverts commit cf2e309e.
      
      Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
      test case [1], which is caused by commit f95bdb70
      
       ("mm: vmscan: make
      global slab shrink lockless").  The root cause is that SRCU has to be
      careful to not frequently check for SRCU read-side critical section exits.
      Therefore, even if no one is currently in the SRCU read-side critical
      section, synchronize_srcu() cannot return quickly.  That's why
      unregister_shrinker() has become slower.
      
      After discussion, we will try to use the refcount+RCU method [2] proposed
      by Dave Chinner to continue to re-implement the lockless slab shrink.  So
      revert the shrinker_mutex back to shrinker_rwsem first.
      
      [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
      [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/
      
      Link: https://lkml.kernel.org/r/20230609081518.3039120-1-qi.zheng@linux.dev
      Link: https://lkml.kernel.org/r/20230609081518.3039120-2-qi.zheng@linux.dev
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yujie Liu <yujie.liu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      47a7c01c
    • Ryusuke Konishi's avatar
      nilfs2: fix buffer corruption due to concurrent device reads · 679bd7eb
      Ryusuke Konishi authored
      
      
      As a result of analysis of a syzbot report, it turned out that in three
      cases where nilfs2 allocates block device buffers directly via sb_getblk,
      concurrent reads to the device can corrupt the allocated buffers.
      
      Nilfs2 uses sb_getblk for segment summary blocks, that make up a log
      header, and the super root block, that is the trailer, and when moving and
      writing the second super block after fs resize.
      
      In any of these, since the uptodate flag is not set when storing metadata
      to be written in the allocated buffers, the stored metadata will be
      overwritten if a device read of the same block occurs concurrently before
      the write.  This causes metadata corruption and misbehavior in the log
      write itself, causing warnings in nilfs_btree_assign() as reported.
      
      Fix these issues by setting an uptodate flag on the buffer head on the
      first or before modifying each buffer obtained with sb_getblk, and
      clearing the flag on failure.
      
      When setting the uptodate flag, the lock_buffer/unlock_buffer pair is used
      to perform necessary exclusive control, and the buffer is filled to ensure
      that uninitialized bytes are not mixed into the data read from others.  As
      for buffers for segment summary blocks, they are filled incrementally, so
      if the uptodate flag was unset on their allocation, set the flag and zero
      fill the buffer once at that point.
      
      Also, regarding the superblock move routine, the starting point of the
      memset call to zerofill the block is incorrectly specified, which can
      cause a buffer overflow on file systems with block sizes greater than
      4KiB.  In addition, if the superblock is moved within a large block, it is
      necessary to assume the possibility that the data in the superblock will
      be destroyed by zero-filling before copying.  So fix these potential
      issues as well.
      
      Link: https://lkml.kernel.org/r/20230609035732.20426-1-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar <syzbot+31837fe952932efc8fb9@syzkaller.appspotmail.com>
      Closes: https://lkml.kernel.org/r/00000000000030000a05e981f475@google.com
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      679bd7eb
    • Florian Fainelli's avatar
      scripts/gdb: fix SB_* constants parsing · 6a59cb51
      Florian Fainelli authored
      --0000000000009a0c9905fd9173ad
      Content-Transfer-Encoding: 8bit
      
      After f15afbd3 ("fs: fix undefined behavior in bit shift for
      SB_NOUSER") the constants were changed from plain integers which
      LX_VALUE() can parse to constants using the BIT() macro which causes the
      following:
      
      Reading symbols from build/linux-custom/vmlinux...done.
      Traceback (most recent call last):
        File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/vmlinux-gdb.py", line 25, in <module>
          import linux.constants
        File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/scripts/gdb/linux/constants.py", line 5
          LX_SB_RDONLY = ((((1UL))) << (0))
      
      Use LX_GDBPARSED() which does not suffer from that issue.
      
      f15afbd3
      
       ("fs: fix undefined behavior in bit shift for SB_NOUSER")
      Link: https://lkml.kernel.org/r/20230607221337.2781730-1-florian.fainelli@broadcom.com
      Signed-off-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Cc: Hao Ge <gehao@kylinos.cn>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6a59cb51
    • Prathu Baronia's avatar
      scripts: fix the gfp flags header path in gfp-translate · 2049a7d0
      Prathu Baronia authored
      Since gfp flags have been shifted to gfp_types.h so update the path in
      the gfp-translate script.
      
      Link: https://lkml.kernel.org/r/20230608154450.21758-1-prathubaronia2011@gmail.com
      Fixes: cb5a065b
      
       ("headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>")
      Signed-off-by: default avatarPrathu Baronia <prathubaronia2011@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nicolas Schier <nicolas@fjasle.eu>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Yury Norov <yury.norov@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2049a7d0
    • Mike Kravetz's avatar
      udmabuf: revert 'Add support for mapping hugepages (v4)' · b7cb3821
      Mike Kravetz authored
      This effectively reverts commit 16c243e9 ("udmabuf: Add support for
      mapping hugepages (v4)").  Recently, Junxiao Chang found a BUG with page
      map counting as described here [1].  This issue pointed out that the
      udmabuf driver was making direct use of subpages of hugetlb pages.  This
      is not a good idea, and no other mm code attempts such use.  In addition
      to the mapcount issue, this also causes issues with hugetlb vmemmap
      optimization and page poisoning.
      
      For now, remove hugetlb support.
      
      If udmabuf wants to be used on hugetlb mappings, it should be changed to
      only use complete hugetlb pages.  This will require different alignment
      and size requirements on the UDMABUF_CREATE API.
      
      [1] https://lore.kernel.org/linux-mm/20230512072036.1027784-1-junxiao.chang@intel.com/
      
      Link: https://lkml.kernel.org/r/20230608204927.88711-1-mike.kravetz@oracle.com
      Fixes: 16c243e9
      
       ("udmabuf: Add support for mapping hugepages (v4)")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarVivek Kasireddy <vivek.kasireddy@intel.com>
      Acked-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dongwon Kim <dongwon.kim@intel.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Junxiao Chang <junxiao.chang@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b7cb3821
    • David Stevens's avatar
      mm/khugepaged: fix iteration in collapse_file · c8a8f3b4
      David Stevens authored
      Remove an unnecessary call to xas_set(index) when iterating over the
      target range in collapse_file.  The extra call to xas_set reset the xas
      cursor to the top of the tree, causing the xas_next call on the next
      iteration to walk the tree to index instead of advancing to index+1.  This
      returned the same page again, which would cause collapse_file to fail
      because the page is already locked.
      
      This bug was hidden when CONFIG_DEBUG_VM was set.  When that config was
      used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from
      the top of the tree to index, causing the xas_next call on the next loop
      iteration to advance the cursor as expected.
      
      Link: https://lkml.kernel.org/r/20230607053135.2087354-1-stevensd@google.com
      Fixes: a2e17cc2
      
       ("mm/khugepaged: maintain page cache uptodate flag")
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Kirill A . Shutemov <kirill@shutemov.name>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c8a8f3b4
    • Roberto Sassu's avatar
      memfd: check for non-NULL file_seals in memfd_create() syscall · 935d44ac
      Roberto Sassu authored
      Ensure that file_seals is non-NULL before using it in the memfd_create()
      syscall.  One situation in which memfd_file_seals_ptr() could return a
      NULL pointer when CONFIG_SHMEM=n, oopsing the kernel.
      
      Link: https://lkml.kernel.org/r/20230607132427.2867435-1-roberto.sassu@huaweicloud.com
      Fixes: 47b9012e
      
       ("shmem: add sealing support to hugetlb-backed memfd")
      Signed-off-by: default avatarRoberto Sassu <roberto.sassu@huawei.com>
      Cc: Marc-Andr Lureau <marcandre.lureau@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      935d44ac
    • Lorenzo Stoakes's avatar
      mm/vmalloc: do not output a spurious warning when huge vmalloc() fails · 95a301ee
      Lorenzo Stoakes authored
      In __vmalloc_area_node() we always warn_alloc() when an allocation
      performed by vm_area_alloc_pages() fails unless it was due to a pending
      fatal signal.
      
      However, huge page allocations instigated either by vmalloc_huge() or
      __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
      kvmalloc_node()) always falls back to order-0 allocations if the huge page
      allocation fails.
      
      This renders the warning useless and noisy, especially as all callers
      appear to be aware that this may fallback.  This has already resulted in
      at least one bug report from a user who was confused by this (see link).
      
      Therefore, simply update the code to only output this warning for order-0
      pages when no fatal signal is pending.
      
      Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
      Link: https://lkml.kernel.org/r/20230605201107.83298-1-lstoakes@gmail.com
      Fixes: 80b1d8fd
      
       ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()")
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      95a301ee
    • Liam R. Howlett's avatar
      mm/mprotect: fix do_mprotect_pkey() limit check · 77795f90
      Liam R. Howlett authored
      The return of do_mprotect_pkey() can still be incorrectly returned as
      success if there is a gap that spans to or beyond the end address passed
      in.  Update the check to ensure that the end address has indeed been seen.
      
      Link: https://lore.kernel.org/all/CABi2SkXjN+5iFoBhxk71t3cmunTk-s=rB4T7qo0UQRh17s49PQ@mail.gmail.com/
      Link: https://lkml.kernel.org/r/20230606182912.586576-1-Liam.Howlett@oracle.com
      Fixes: 82f95134
      
       ("mm/mprotect: fix do_mprotect_pkey() return on error")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarJeff Xu <jeffxu@chromium.org>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      77795f90
    • Rafael Aquini's avatar
      writeback: fix dereferencing NULL mapping->host on writeback_page_template · 54abe19e
      Rafael Aquini authored
      When commit 19343b5b ("mm/page-writeback: introduce tracepoint for
      wait_on_page_writeback()") repurposed the writeback_dirty_page trace event
      as a template to create its new wait_on_page_writeback trace event, it
      ended up opening a window to NULL pointer dereference crashes due to the
      (infrequent) occurrence of a race where an access to a page in the
      swap-cache happens concurrently with the moment this page is being written
      to disk and the tracepoint is enabled:
      
          BUG: kernel NULL pointer dereference, address: 0000000000000040
          #PF: supervisor read access in kernel mode
          #PF: error_code(0x0000) - not-present page
          PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0
          Oops: 0000 [#1] PREEMPT SMP PTI
          CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023
          RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0
          Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d
          RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246
          RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000
          RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044
          RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006
          R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000
          R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200
          FS:  00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0
          Call Trace:
           <TASK>
           ? __die+0x20/0x70
           ? page_fault_oops+0x76/0x170
           ? kernelmode_fixup_or_oops+0x84/0x110
           ? exc_page_fault+0x65/0x150
           ? asm_exc_page_fault+0x22/0x30
           ? trace_event_raw_event_writeback_folio_template+0x76/0xf0
           folio_wait_writeback+0x6b/0x80
           shmem_swapin_folio+0x24a/0x500
           ? filemap_get_entry+0xe3/0x140
           shmem_get_folio_gfp+0x36e/0x7c0
           ? find_busiest_group+0x43/0x1a0
           shmem_fault+0x76/0x2a0
           ? __update_load_avg_cfs_rq+0x281/0x2f0
           __do_fault+0x33/0x130
           do_read_fault+0x118/0x160
           do_pte_missing+0x1ed/0x2a0
           __handle_mm_fault+0x566/0x630
           handle_mm_fault+0x91/0x210
           do_user_addr_fault+0x22c/0x740
           exc_page_fault+0x65/0x150
           asm_exc_page_fault+0x22/0x30
      
      This problem arises from the fact that the repurposed writeback_dirty_page
      trace event code was written assuming that every pointer to mapping
      (struct address_space) would come from a file-mapped page-cache object,
      thus mapping->host would always be populated, and that was a valid case
      before commit 19343b5b.  The swap-cache address space
      (swapper_spaces), however, doesn't populate its ->host (struct inode)
      pointer, thus leading to the crashes in the corner-case aforementioned.
      
      commit 19343b5b ended up breaking the assignment of __entry->name and
      __entry->ino for the wait_on_page_writeback tracepoint -- both dependent
      on mapping->host carrying a pointer to a valid inode.  The assignment of
      __entry->name was fixed by commit 68f23b89 ("memcg: fix a crash in
      wb_workfn when a device disappears"), and this commit fixes the remaining
      case, for __entry->ino.
      
      Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com
      Fixes: 19343b5b
      
       ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()")
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Reviewed-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      54abe19e
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20230719' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · dbad9ce9
      Linus Torvalds authored
      Pull AFS writeback fixes from David Howells:
      
       - release the acquired batch before returning if we got >=5 skips
      
       - retry a page we had to wait for rather than skipping over it after
         the wait
      
      * tag 'afs-fixes-20230719' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix waiting for writeback then skipping folio
        afs: Fix dangling folio ref counts in writeback
      dbad9ce9
  4. Jun 19, 2023
  5. Jun 18, 2023
    • Dexuan Cui's avatar
      PCI: hv: Add a per-bus mutex state_lock · 067d6ec7
      Dexuan Cui authored
      In the case of fast device addition/removal, it's possible that
      hv_eject_device_work() can start to run before create_root_hv_pci_bus()
      starts to run; as a result, the pci_get_domain_bus_and_slot() in
      hv_eject_device_work() can return a 'pdev' of NULL, and
      hv_eject_device_work() can remove the 'hpdev', and immediately send a
      message PCI_EJECTION_COMPLETE to the host, and the host immediately
      unassigns the PCI device from the guest; meanwhile,
      create_root_hv_pci_bus() and the PCI device driver can be probing the
      dead PCI device and reporting timeout errors.
      
      Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
      mutex before powering on the PCI bus in hv_pci_enter_d0(): when
      hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
      pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
      loaded, the PCI device driver's probe() function is already called in
      create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
      hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
      to call the PCI device driver's remove() function and remove the device
      reliably; if the PCI device driver hasn't loaded yet, the function call
      hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
      remove the PCI device reliably and the PCI device driver's probe()
      function won't be called; if the PCI device driver's probe() is already
      running (e.g., systemd-udev is loading the PCI device driver), it must
      be holding the per-device lock, and after the probe() finishes and releases
      the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
      able to proceed to remove the device reliably.
      
      Fixes: 4daace0d
      
       ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Acked-by: default avatarLorenzo Pieralisi <lpieralisi@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com
      Signed-off-by: default avatarWei Liu <wei.liu@kernel.org>
      067d6ec7