Skip to content
  1. Jun 21, 2023
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of... · 8ba90f5c
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull hotfixes from Andrew Morton:
       "19 hotfixes.  8 of these are cc:stable.
      
        This includes a wholesale reversion of the post-6.4 series 'make slab
        shrink lockless'. After input from Dave Chinner it has been decided
        that we should go a different way [1]"
      
      Link: https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area [1]
      
      * tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        selftests/mm: fix cross compilation with LLVM
        mailmap: add entries for Ben Dooks
        nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
        Revert "mm: vmscan: make global slab shrink lockless"
        Revert "mm: vmscan: make memcg slab shrink lockless"
        Revert "mm: vmscan: add shrinker_srcu_generation"
        Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"
        Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred"
        Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()"
        Revert "mm: shrinkers: convert shrinker_rwsem to mutex"
        nilfs2: fix buffer corruption due to concurrent device reads
        scripts/gdb: fix SB_* constants parsing
        scripts: fix the gfp flags header path in gfp-translate
        udmabuf: revert 'Add support for mapping hugepages (v4)'
        mm/khugepaged: fix iteration in collapse_file
        memfd: check for non-NULL file_seals in memfd_create() syscall
        mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
        mm/mprotect: fix do_mprotect_pkey() limit check
        writeback: fix dereferencing NULL mapping->host on writeback_page_template
      8ba90f5c
    • Linus Torvalds's avatar
      Merge tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e660abd5
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Fix a kernel crash during early resume from ACPI S3 that has been
        present since the 5.15 cycle when might_sleep() was added to
        down_timeout(), which in some configurations of the kernel caused an
        implicit preemption point to trigger at a wrong time"
      
      * tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()
      e660abd5
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · c74e2ac2
      Linus Torvalds authored
      Pull thermal control fix from Rafael Wysocki:
       "Fix a regression introduced during the 6.3 cycle causing
        intel_soc_dts_iosf to report incorrect temperature values
        due to a coding mistake (Hans de Goede)"
      
      * tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal/intel/intel_soc_dts_iosf: Fix reporting wrong temperatures
      c74e2ac2
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 2e30b973
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix MAINTAINERS file to point to proper mailing list for rtla and rv
      
         The mailing list pointed to linux-trace-devel instead of
         linux-trace-kernel. The former is for the tracing libraries and the
         latter is for anything in the Linux kernel tree. The wrong mailing
         list was used because linux-trace-kernel did not exist when rtla and
         rv were created.
      
       - User events:
      
          - Fix matching of dynamic events to their user events
      
            When user writes to dynamic_events file, a lookup of the
            registered dynamic events is made, but there were some cases that
            a match could be incorrectly made.
      
          - Add auto cleanup of user events
      
            Have the user events automatically get removed when the last
            reference (file descriptor) is closed. This was asked for to
            prevent leaks of user events hanging around needing admins to
            clean them up.
      
          - Add persistent logic (but not let user space use it yet)
      
            In some cases, having a persistent user event (one that does not
            get cleaned up automatically) is useful. But there's still debates
            about how to expose this to user space. The infrastructure is
            added, but the API is not.
      
          - Update the selftests
      
            Update the user event selftests to reflect the above changes"
      
      * tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/user_events: Document auto-cleanup and remove dyn_event refs
        selftests/user_events: Adapt dyn_test to non-persist events
        selftests/user_events: Ensure auto cleanup works as expected
        tracing/user_events: Add auto cleanup and future persist flag
        tracing/user_events: Track refcount consistently via put/get
        tracing/user_events: Store register flags on events
        tracing/user_events: Remove user_ns walk for groups
        selftests/user_events: Add perf self-test for empty arguments events
        selftests/user_events: Clear the events after perf self-test
        selftests/user_events: Add ftrace self-test for empty arguments events
        tracing/user_events: Fix the incorrect trace record for empty arguments events
        tracing: Modify print_fields() for fields output order
        tracing/user_events: Handle matching arguments that is null from dyn_events
        tracing/user_events: Prevent same name but different args event
        tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
      2e30b973
    • Linus Torvalds's avatar
      Merge tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 4b0c7a1b
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "One more regression fix for an assertion failure that uncovered a
        nasty problem with stripe calculations. This is caused by a u32
        overflow when there are enough devices. The fstests require 6 so this
        hasn't been caught, I was able to hit it with 8.
      
        The fix is minimal and only adds u64 casts, we'll clean that up later.
        I did various additional tests to be sure"
      
      * tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix u32 overflows when left shifting stripe_nr
      4b0c7a1b
    • Linus Torvalds's avatar
      Merge tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd · 99ec1ed7
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Four smb3 server fixes, all also for stable:
      
         - fix potential oops in parsing compounded requests
      
         - fix various paths (mkdir, create etc) where mnt_want_write was not
           checked first
      
         - fix slab out of bounds in check_message and write"
      
      * tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: validate session id and tree id in the compound request
        ksmbd: fix out-of-bound read in smb2_write
        ksmbd: add mnt_want_write to ksmbd vfs functions
        ksmbd: validate command payload size
      99ec1ed7
    • Qu Wenruo's avatar
      btrfs: fix u32 overflows when left shifting stripe_nr · a7299a18
      Qu Wenruo authored
      
      
      [BUG]
      David reported an ASSERT() get triggered during fio load on 8 devices
      with data/raid6 and metadata/raid1c3:
      
        fio --rw=randrw --randrepeat=1 --size=3000m \
      	  --bsrange=512b-64k --bs_unaligned \
      	  --ioengine=libaio --fsync=1024 \
      	  --name=job0 --name=job1 \
      
      The ASSERT() is from rbio_add_bio() of raid56.c:
      
      	ASSERT(orig_logical >= full_stripe_start &&
      	       orig_logical + orig_len <= full_stripe_start +
      	       rbio->nr_data * BTRFS_STRIPE_LEN);
      
      Which is checking if the target rbio is crossing the full stripe
      boundary.
      
        [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622
        [100.795] ------------[ cut here ]------------
        [100.796] kernel BUG at fs/btrfs/raid56.c:1622!
        [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
        [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124
        [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
        [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1)
        [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs]
        [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246
        [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01
        [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
        [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f
        [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400
        [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000
        [100.817] FS:  0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
        [100.821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0
        [100.824] Call Trace:
        [100.825]  <TASK>
        [100.825]  ? die+0x32/0x80
        [100.826]  ? do_trap+0x12d/0x160
        [100.827]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.827]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.829]  ? do_error_trap+0x90/0x130
        [100.830]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.831]  ? handle_invalid_op+0x2c/0x30
        [100.833]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.835]  ? exc_invalid_op+0x29/0x40
        [100.836]  ? asm_exc_invalid_op+0x16/0x20
        [100.837]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.837]  raid56_parity_write+0x64/0x270 [btrfs]
        [100.838]  btrfs_submit_chunk+0x26e/0x800 [btrfs]
        [100.840]  ? btrfs_bio_init+0x80/0x80 [btrfs]
        [100.841]  ? release_pages+0x503/0x6d0
        [100.842]  ? folio_unlock+0x2f/0x60
        [100.844]  ? __folio_put+0x60/0x60
        [100.845]  ? btrfs_do_readpage+0xae0/0xae0 [btrfs]
        [100.847]  btrfs_submit_bio+0x21/0x60 [btrfs]
        [100.847]  submit_one_bio+0x6a/0xb0 [btrfs]
        [100.849]  extent_write_cache_pages+0x395/0x680 [btrfs]
        [100.850]  ? __extent_writepage+0x520/0x520 [btrfs]
        [100.851]  ? mark_usage+0x190/0x190
        [100.852]  extent_writepages+0xdb/0x130 [btrfs]
        [100.853]  ? extent_write_locked_range+0x480/0x480 [btrfs]
        [100.854]  ? mark_usage+0x190/0x190
        [100.854]  ? attach_extent_buffer_page+0x220/0x220 [btrfs]
        [100.855]  ? reacquire_held_locks+0x178/0x280
        [100.856]  ? writeback_sb_inodes+0x245/0x7f0
        [100.857]  do_writepages+0x102/0x2e0
        [100.858]  ? page_writeback_cpu_online+0x10/0x10
        [100.859]  ? __lock_release.isra.0+0x14a/0x4d0
        [100.860]  ? reacquire_held_locks+0x280/0x280
        [100.861]  ? __lock_acquired+0x1e9/0x3d0
        [100.862]  ? do_raw_spin_lock+0x1b0/0x1b0
        [100.863]  __writeback_single_inode+0x94/0x450
        [100.864]  writeback_sb_inodes+0x372/0x7f0
        [100.864]  ? lock_sync+0xd0/0xd0
        [100.865]  ? do_raw_spin_unlock+0x93/0xf0
        [100.866]  ? sync_inode_metadata+0xc0/0xc0
        [100.867]  ? rwsem_optimistic_spin+0x340/0x340
        [100.868]  __writeback_inodes_wb+0x70/0x130
        [100.869]  wb_writeback+0x2d1/0x530
        [100.869]  ? __writeback_inodes_wb+0x130/0x130
        [100.870]  ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0
        [100.870]  wb_do_writeback+0x3eb/0x480
        [100.871]  ? wb_writeback+0x530/0x530
        [100.871]  ? mark_lock_irq+0xcd0/0xcd0
        [100.872]  wb_workfn+0xe0/0x3f0<
      
      [CAUSE]
      Commit a97699d1 ("btrfs: replace map_lookup->stripe_len by
      BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
      u64 division.
      
      Function btrfs_max_io_len() is to get the length to the stripe boundary.
      
      It calculates the full stripe start offset (inside the chunk) by the
      following code:
      
      		*full_stripe_start =
      			rounddown(*stripe_nr, nr_data_stripes(map)) <<
      			BTRFS_STRIPE_LEN_SHIFT;
      
      The calculation itself is fine, but the value returned by rounddown() is
      dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
      returned int).
      
      Thus the result is also u32, then we do the left shift, which can
      overflow u32.
      
      If such overflow happens, @full_stripe_start will be a value way smaller
      than @offset, causing later "full_stripe_len - (offset -
      *full_stripe_start)" to underflow, thus make later length calculation to
      have no stripe boundary limit, resulting a write bio to exceed stripe
      boundary.
      
      There are some other locations like this, with a u32 @stripe_nr got left
      shift, which can lead to a similar overflow.
      
      [FIX]
      Fix all @stripe_nr with left shift with a type cast to u64 before the
      left shift.
      
      Those involved @stripe_nr or similar variables are recording the stripe
      number inside the chunk, which is small enough to be contained by u32,
      but their offset inside the chunk can not fit into u32.
      
      Thus for those specific left shifts, a type cast to u64 is necessary so
      this patch does not touch them and the code will be cleaned up in the
      future to keep the fix minimal.
      
      Reported-by: default avatarDavid Sterba <dsterba@suse.com>
      Fixes: a97699d1 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
      Tested-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a7299a18
  2. Jun 20, 2023
  3. Jun 19, 2023
  4. Jun 18, 2023
    • Dexuan Cui's avatar
      PCI: hv: Add a per-bus mutex state_lock · 067d6ec7
      Dexuan Cui authored
      
      
      In the case of fast device addition/removal, it's possible that
      hv_eject_device_work() can start to run before create_root_hv_pci_bus()
      starts to run; as a result, the pci_get_domain_bus_and_slot() in
      hv_eject_device_work() can return a 'pdev' of NULL, and
      hv_eject_device_work() can remove the 'hpdev', and immediately send a
      message PCI_EJECTION_COMPLETE to the host, and the host immediately
      unassigns the PCI device from the guest; meanwhile,
      create_root_hv_pci_bus() and the PCI device driver can be probing the
      dead PCI device and reporting timeout errors.
      
      Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
      mutex before powering on the PCI bus in hv_pci_enter_d0(): when
      hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
      pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
      loaded, the PCI device driver's probe() function is already called in
      create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
      hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
      to call the PCI device driver's remove() function and remove the device
      reliably; if the PCI device driver hasn't loaded yet, the function call
      hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
      remove the PCI device reliably and the PCI device driver's probe()
      function won't be called; if the PCI device driver's probe() is already
      running (e.g., systemd-udev is loading the PCI device driver), it must
      be holding the per-device lock, and after the probe() finishes and releases
      the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
      able to proceed to remove the device reliably.
      
      Fixes: 4daace0d ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Acked-by: default avatarLorenzo Pieralisi <lpieralisi@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com
      
      
      Signed-off-by: default avatarWei Liu <wei.liu@kernel.org>
      067d6ec7
    • Dexuan Cui's avatar
      Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally" · a847234e
      Dexuan Cui authored
      
      
      This reverts commit d6af2ed2.
      
      The statement "the hv_pci_bus_exit() call releases structures of all its
      child devices" in commit d6af2ed2 is not true: in the path
      hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the
      parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the
      child "struct hv_pci_dev *hpdev" that is created earlier in
      pci_devices_present_work() -> new_pcichild_device().
      
      The commit d6af2ed2 was originally made in July 2020 for RHEL 7.7,
      where the old version of hv_pci_bus_exit() was used; when the commit was
      rebased and merged into the upstream, people didn't notice that it's
      not really necessary. The commit itself doesn't cause any issue, but it
      makes hv_pci_probe() more complicated. Revert it to facilitate some
      upcoming changes to hv_pci_probe().
      
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Acked-by: default avatarWei Hu <weh@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230615044451.5580-5-decui@microsoft.com
      
      
      Signed-off-by: default avatarWei Liu <wei.liu@kernel.org>
      a847234e
    • Dexuan Cui's avatar
      PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev · add9195e
      Dexuan Cui authored
      
      
      The hpdev->state is never really useful. The only use in
      hv_pci_eject_device() and hv_eject_device_work() is not really necessary.
      
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Acked-by: default avatarLorenzo Pieralisi <lpieralisi@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230615044451.5580-4-decui@microsoft.com
      
      
      Signed-off-by: default avatarWei Liu <wei.liu@kernel.org>
      add9195e
    • Dexuan Cui's avatar
      PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic · 2738d5ab
      Dexuan Cui authored
      
      
      When the host tries to remove a PCI device, the host first sends a
      PCI_EJECT message to the guest, and the guest is supposed to gracefully
      remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
      the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
      the guest (when the guest receives this message, the device is already
      unassigned from the guest) and the guest can do some final cleanup work;
      if the guest fails to respond to the PCI_EJECT message within one minute,
      the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
      removes the PCI device forcibly.
      
      In the case of fast device addition/removal, it's possible that the PCI
      device driver is still configuring MSI-X interrupts when the guest receives
      the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
      which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
      hv_eject_device_work(); if the PCI device driver is calling
      pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
      while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
      leave data->chip_data with its default value of NULL; later, when the PCI
      device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
      crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
      
      Fix the issue by not testing hpdev->state in the while loop: when the
      guest receives PCI_EJECT, the device is still assigned to the guest, and
      the guest has one minute to finish the device removal gracefully. We don't
      really need to (and we should not) test hpdev->state in the loop.
      
      Fixes: de0aa7b2 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230615044451.5580-3-decui@microsoft.com
      
      
      Signed-off-by: default avatarWei Liu <wei.liu@kernel.org>
      2738d5ab
    • Dexuan Cui's avatar
      PCI: hv: Fix a race condition bug in hv_pci_query_relations() · 440b5e36
      Dexuan Cui authored
      
      
      Since day 1 of the driver, there has been a race between
      hv_pci_query_relations() and survey_child_resources(): during fast
      device hotplug, hv_pci_query_relations() may error out due to
      device-remove and the stack variable 'comp' is no longer valid;
      however, pci_devices_present_work() -> survey_child_resources() ->
      complete() may be running on another CPU and accessing the no-longer-valid
      'comp'. Fix the race by flushing the workqueue before we exit from
      hv_pci_query_relations().
      
      Fixes: 4daace0d ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Acked-by: default avatarLorenzo Pieralisi <lpieralisi@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230615044451.5580-2-decui@microsoft.com
      
      
      Signed-off-by: default avatarWei Liu <wei.liu@kernel.org>
      440b5e36