Skip to content
  1. Dec 14, 2023
    • Sumanth Korikkar's avatar
      mm/memory_hotplug: add missing mem_hotplug_lock · e0270ffa
      Sumanth Korikkar authored
      commit 001002e7 upstream.
      
      From Documentation/core-api/memory-hotplug.rst:
      When adding/removing/onlining/offlining memory or adding/removing
      heterogeneous/device memory, we should always hold the mem_hotplug_lock
      in write mode to serialise memory hotplug (e.g. access to global/zone
      variables).
      
      mhp_(de)init_memmap_on_memory() functions can change zone stats and
      struct page content, but they are currently called w/o the
      mem_hotplug_lock.
      
      When memory block is being offlined and when kmemleak goes through each
      populated zone, the following theoretical race conditions could occur:
      CPU 0:					     | CPU 1:
      memory_offline()			     |
      -> offline_pages()			     |
      	-> mem_hotplug_begin()		     |
      	   ...				     |
      	-> mem_hotplug_done()		     |
      					     | kmemleak_scan()
      					     | -> get_online_mems()
      					     |    ...
      -> mhp_deinit_memmap_on_memory()	     |
        [not protected by mem_hotplug_begin/done()]|
        Marks memory section as offline,	     |   Retrieves zone_start_pfn
        poisons vmemmap struct pages and updates   |   and struct page members.
        the zone related data			     |
         					     |    ...
         					     | -> put_online_mems()
      
      Fix this by ensuring mem_hotplug_lock is taken before performing
      mhp_init_memmap_on_memory().  Also ensure that
      mhp_deinit_memmap_on_memory() holds the lock.
      
      online/offline_pages() are currently only called from
      memory_block_online/offline(), so it is safe to move the locking there.
      
      Link: https://lkml.kernel.org/r/20231120145354.308999-2-sumanthk@linux.ibm.com
      Fixes: a08a2ae3
      
       ("mm,memory_hotplug: allocate memmap from the added memory range")
      Signed-off-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Reviewed-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: <stable@vger.kernel.org>	[5.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0270ffa
    • Baoquan He's avatar
      drivers/base/cpu: crash data showing should depends on KEXEC_CORE · 83dd18e0
      Baoquan He authored
      commit 4e9e2e4c upstream.
      
      After commit 88a6f899 ("crash: memory and CPU hotplug sysfs
      attributes"), on x86_64, if only below kernel configs related to kdump are
      set, compiling error are triggered.
      
      ----
      CONFIG_CRASH_CORE=y
      CONFIG_KEXEC_CORE=y
      CONFIG_CRASH_DUMP=y
      CONFIG_CRASH_HOTPLUG=y
      ------
      
      ------------------------------------------------------
      drivers/base/cpu.c: In function `crash_hotplug_show':
      drivers/base/cpu.c:309:40: error: implicit declaration of function `crash_hotplug_cpu_support'; did you mean `crash_hotplug_show'? [-Werror=implicit-function-declaration]
        309 |         return sysfs_emit(buf, "%d\n", crash_hotplug_cpu_support());
            |                                        ^~~~~~~~~~~~~~~~~~~~~~~~~
            |                                        crash_hotplug_show
      cc1: some warnings being treated as errors
      ------------------------------------------------------
      
      CONFIG_KEXEC is used to enable kexec_load interface, the
      crash_notes/crash_notes_size/crash_hotplug showing depends on
      CONFIG_KEXEC is incorrect. It should depend on KEXEC_CORE instead.
      
      Fix it now.
      
      Link: https://lkml.kernel.org/r/20231128055248.659808-1-bhe@redhat.com
      Fixes: 88a6f899
      
       ("crash: memory and CPU hotplug sysfs attributes")
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Tested-by: Ignat Korchagin <ignat@cloudflare.com>	[compile-time only]
      Tested-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Reviewed-by: default avatarEric DeVolder <eric_devolder@yahoo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83dd18e0
    • Mike Kravetz's avatar
      hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write · 512b420a
      Mike Kravetz authored
      commit 187da0f8 upstream.
      
      The routine __vma_private_lock tests for the existence of a reserve map
      associated with a private hugetlb mapping.  A pointer to the reserve map
      is in vma->vm_private_data.  __vma_private_lock was checking the pointer
      for NULL.  However, it is possible that the low bits of the pointer could
      be used as flags.  In such instances, vm_private_data is not NULL and not
      a valid pointer.  This results in the null-ptr-deref reported by syzbot:
      
      general protection fault, probably for non-canonical address 0xdffffc000000001d:
       0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
      CPU: 0 PID: 5048 Comm: syz-executor139 Not tainted 6.6.0-rc7-syzkaller-00142-g88
      8cf78c29e2 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 1
      0/09/2023
      RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5004
      ...
      Call Trace:
       <TASK>
       lock_acquire kernel/locking/lockdep.c:5753 [inline]
       lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5718
       down_write+0x93/0x200 kernel/locking/rwsem.c:1573
       hugetlb_vma_lock_write mm/hugetlb.c:300 [inline]
       hugetlb_vma_lock_write+0xae/0x100 mm/hugetlb.c:291
       __hugetlb_zap_begin+0x1e9/0x2b0 mm/hugetlb.c:5447
       hugetlb_zap_begin include/linux/hugetlb.h:258 [inline]
       unmap_vmas+0x2f4/0x470 mm/memory.c:1733
       exit_mmap+0x1ad/0xa60 mm/mmap.c:3230
       __mmput+0x12a/0x4d0 kernel/fork.c:1349
       mmput+0x62/0x70 kernel/fork.c:1371
       exit_mm kernel/exit.c:567 [inline]
       do_exit+0x9ad/0x2a20 kernel/exit.c:861
       __do_sys_exit kernel/exit.c:991 [inline]
       __se_sys_exit kernel/exit.c:989 [inline]
       __x64_sys_exit+0x42/0x50 kernel/exit.c:989
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Mask off low bit flags before checking for NULL pointer.  In addition, the
      reserve map only 'belongs' to the OWNER (parent in parent/child
      relationships) so also check for the OWNER flag.
      
      Link: https://lkml.kernel.org/r/20231114012033.259600-1-mike.kravetz@oracle.com
      
      
      Reported-by: default avatar <syzbot+6ada951e7c0f7bc8a71e@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/linux-mm/00000000000078d1e00608d7878b@google.com/
      Fixes: bf491692
      
       ("hugetlbfs: extend hugetlb_vma_lock to private VMAs")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Cc: Edward Adam Davis <eadavis@qq.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      512b420a
    • Tejun Heo's avatar
      workqueue: Make sure that wq_unbound_cpumask is never empty · b2c562a7
      Tejun Heo authored
      commit 4a6c5607 upstream.
      
      During boot, depending on how the housekeeping and workqueue.unbound_cpus
      masks are set, wq_unbound_cpumask can end up empty. Since 8639eceb
      
      
      ("workqueue: Implement non-strict affinity scope for unbound workqueues"),
      this may end up feeding -1 as a CPU number into scheduler leading to oopses.
      
        BUG: unable to handle page fault for address: ffffffff8305e9c0
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        ...
        Call Trace:
         <TASK>
         select_idle_sibling+0x79/0xaf0
         select_task_rq_fair+0x1cb/0x7b0
         try_to_wake_up+0x29c/0x5c0
         wake_up_process+0x19/0x20
         kick_pool+0x5e/0xb0
         __queue_work+0x119/0x430
         queue_work_on+0x29/0x30
        ...
      
      An empty wq_unbound_cpumask is a clear misconfiguration and already
      disallowed once system is booted up. Let's warn on and ignore
      unbound_cpumask restrictions which lead to no unbound cpus. While at it,
      also remove now unncessary empty check on wq_unbound_cpumask in
      wq_select_unbound_cpu().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-Tested-by: default avatarYong He <alexyonghe@tencent.com>
      Link: http://lkml.kernel.org/r/20231120121623.119780-1-alexyonghe@tencent.com
      Fixes: 8639eceb
      
       ("workqueue: Implement non-strict affinity scope for unbound workqueues")
      Cc: stable@vger.kernel.org # v6.6+
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2c562a7
    • Francesco Dolcini's avatar
      platform/surface: aggregator: fix recv_buf() return value · 7409c28c
      Francesco Dolcini authored
      commit c8820c92 upstream.
      
      Serdev recv_buf() callback is supposed to return the amount of bytes
      consumed, therefore an int in between 0 and count.
      
      Do not return negative number in case of issue, when
      ssam_controller_receive_buf() returns ESHUTDOWN just returns 0, e.g. no
      bytes consumed, this keep the exact same behavior as it was before.
      
      This fixes a potential WARN in serdev-ttyport.c:ttyport_receive_buf().
      
      Fixes: c167b9c7
      
       ("platform/surface: Add Surface Aggregator subsystem")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFrancesco Dolcini <francesco.dolcini@toradex.com>
      Reviewed-by: default avatarMaximilian Luz <luzmaximilian@gmail.com>
      Link: https://lore.kernel.org/r/20231128194935.11350-1-francesco@dolcini.it
      
      
      Reviewed-by: default avatarIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7409c28c
    • Matthias Reichl's avatar
      regmap: fix bogus error on regcache_sync success · 78c8fc33
      Matthias Reichl authored
      commit fea88064 upstream.
      
      Since commit 0ec77316 ("regmap: Ensure range selector registers
      are updated after cache sync") opening pcm512x based soundcards fail
      with EINVAL and dmesg shows sync cache and pm_runtime_get errors:
      
      [  228.794676] pcm512x 1-004c: Failed to sync cache: -22
      [  228.794740] pcm512x 1-004c: ASoC: error at snd_soc_pcm_component_pm_runtime_get on pcm512x.1-004c: -22
      
      This is caused by the cache check result leaking out into the
      regcache_sync return value.
      
      Fix this by making the check local-only, as the comment above the
      regcache_read call states a non-zero return value means there's
      nothing to do so the return value should not be altered.
      
      Fixes: 0ec77316
      
       ("regmap: Ensure range selector registers are updated after cache sync")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthias Reichl <hias@horus.com>
      Link: https://lore.kernel.org/r/20231203222216.96547-1-hias@horus.com
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78c8fc33
    • ChunHao Lin's avatar
      r8169: fix rtl8125b PAUSE frames blasting when suspended · 2e04cfdd
      ChunHao Lin authored
      commit 4b0768b6 upstream.
      
      When FIFO reaches near full state, device will issue pause frame.
      If pause slot is enabled(set to 1), in this time, device will issue
      pause frame only once. But if pause slot is disabled(set to 0), device
      will keep sending pause frames until FIFO reaches near empty state.
      
      When pause slot is disabled, if there is no one to handle receive
      packets, device FIFO will reach near full state and keep sending
      pause frames. That will impact entire local area network.
      
      This issue can be reproduced in Chromebox (not Chromebook) in
      developer mode running a test image (and v5.10 kernel):
      1) ping -f $CHROMEBOX (from workstation on same local network)
      2) run "powerd_dbus_suspend" from command line on the $CHROMEBOX
      3) ping $ROUTER (wait until ping fails from workstation)
      
      Takes about ~20-30 seconds after step 2 for the local network to
      stop working.
      
      Fix this issue by enabling pause slot to only send pause frame once
      when FIFO reaches near full state.
      
      Fixes: f1bce4ad
      
       ("r8169: add support for RTL8125")
      Reported-by: default avatarGrant Grundler <grundler@chromium.org>
      Tested-by: default avatarGrant Grundler <grundler@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChunHao Lin <hau@realtek.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/20231129155350.5843-1-hau@realtek.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e04cfdd
    • Daniel Borkmann's avatar
      packet: Move reference count in packet_sock to atomic_long_t · 865b7157
      Daniel Borkmann authored
      commit db3fadac
      
       upstream.
      
      In some potential instances the reference count on struct packet_sock
      could be saturated and cause overflows which gets the kernel a bit
      confused. To prevent this, move to a 64-bit atomic reference count on
      64-bit architectures to prevent the possibility of this type to overflow.
      
      Because we can not handle saturation, using refcount_t is not possible
      in this place. Maybe someday in the future if it changes it could be
      used. Also, instead of using plain atomic64_t, use atomic_long_t instead.
      32-bit machines tend to be memory-limited (i.e. anything that increases
      a reference uses so much memory that you can't actually get to 2**32
      references). 32-bit architectures also tend to have serious problems
      with 64-bit atomics. Hence, atomic_long_t is the more natural solution.
      
      Reported-by: default avatar"The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
      Co-developed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@kernel.org
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20231201131021.19999-1-daniel@iogearbox.net
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      865b7157
    • Hui Zhou's avatar
      nfp: flower: fix for take a mutex lock in soft irq context and rcu lock · 9a89aad0
      Hui Zhou authored
      commit 0ad722bd upstream.
      
      The neighbour event callback call the function nfp_tun_write_neigh,
      this function will take a mutex lock and it is in soft irq context,
      change the work queue to process the neighbour event.
      
      Move the nfp_tun_write_neigh function out of range rcu_read_lock/unlock()
      in function nfp_tunnel_request_route_v4 and nfp_tunnel_request_route_v6.
      
      Fixes: abc21095
      
       ("nfp: flower: tunnel neigh support bond offload")
      CC: stable@vger.kernel.org # 6.2+
      Signed-off-by: default avatarHui Zhou <hui.zhou@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a89aad0
    • Heiner Kallweit's avatar
      leds: trigger: netdev: fix RTNL handling to prevent potential deadlock · 3c0adff9
      Heiner Kallweit authored
      commit fe2b1226 upstream.
      
      When working on LED support for r8169 I got the following lockdep
      warning. Easiest way to prevent this scenario seems to be to take
      the RTNL lock before the trigger_data lock in set_device_name().
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.7.0-rc2-next-20231124+ #2 Not tainted
      ------------------------------------------------------
      bash/383 is trying to acquire lock:
      ffff888103aa1c68 (&trigger_data->lock){+.+.}-{3:3}, at: netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
      
      but task is already holding lock:
      ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (rtnl_mutex){+.+.}-{3:3}:
             __mutex_lock+0x9b/0xb50
             mutex_lock_nested+0x16/0x20
             rtnl_lock+0x12/0x20
             set_device_name+0xa9/0x120 [ledtrig_netdev]
             netdev_trig_activate+0x1a1/0x230 [ledtrig_netdev]
             led_trigger_set+0x172/0x2c0
             led_trigger_write+0xf1/0x140
             sysfs_kf_bin_write+0x5d/0x80
             kernfs_fop_write_iter+0x15d/0x210
             vfs_write+0x1f0/0x510
             ksys_write+0x6c/0xf0
             __x64_sys_write+0x14/0x20
             do_syscall_64+0x3f/0xf0
             entry_SYSCALL_64_after_hwframe+0x6c/0x74
      
      -> #0 (&trigger_data->lock){+.+.}-{3:3}:
             __lock_acquire+0x1459/0x25a0
             lock_acquire+0xc8/0x2d0
             __mutex_lock+0x9b/0xb50
             mutex_lock_nested+0x16/0x20
             netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
             call_netdevice_register_net_notifiers+0x5a/0x100
             register_netdevice_notifier+0x85/0x120
             netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev]
             led_trigger_set+0x172/0x2c0
             led_trigger_write+0xf1/0x140
             sysfs_kf_bin_write+0x5d/0x80
             kernfs_fop_write_iter+0x15d/0x210
             vfs_write+0x1f0/0x510
             ksys_write+0x6c/0xf0
             __x64_sys_write+0x14/0x20
             do_syscall_64+0x3f/0xf0
             entry_SYSCALL_64_after_hwframe+0x6c/0x74
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock(&trigger_data->lock);
                                     lock(rtnl_mutex);
        lock(&trigger_data->lock);
      
       *** DEADLOCK ***
      
      8 locks held by bash/383:
       #0: ffff888103ff33f0 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x6c/0xf0
       #1: ffff888103aa1e88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x114/0x210
       #2: ffff8881036f1890 (kn->active#82){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x11d/0x210
       #3: ffff888108e2c358 (&led_cdev->led_access){+.+.}-{3:3}, at: led_trigger_write+0x30/0x140
       #4: ffffffff8cdd9e10 (triggers_list_lock){++++}-{3:3}, at: led_trigger_write+0x75/0x140
       #5: ffff888108e2c270 (&led_cdev->trigger_lock){++++}-{3:3}, at: led_trigger_write+0xe3/0x140
       #6: ffffffff8cdde3d0 (pernet_ops_rwsem){++++}-{3:3}, at: register_netdevice_notifier+0x1c/0x120
       #7: ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20
      
      stack backtrace:
      CPU: 0 PID: 383 Comm: bash Not tainted 6.7.0-rc2-next-20231124+ #2
      Hardware name: Default string Default string/Default string, BIOS ADLN.M6.SODIMM.ZB.CY.015 08/08/2023
      Call Trace:
       <TASK>
       dump_stack_lvl+0x5c/0xd0
       dump_stack+0x10/0x20
       print_circular_bug+0x2dd/0x410
       check_noncircular+0x131/0x150
       __lock_acquire+0x1459/0x25a0
       lock_acquire+0xc8/0x2d0
       ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
       __mutex_lock+0x9b/0xb50
       ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
       ? __this_cpu_preempt_check+0x13/0x20
       ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
       ? __cancel_work_timer+0x11c/0x1b0
       ? __mutex_lock+0x123/0xb50
       mutex_lock_nested+0x16/0x20
       ? mutex_lock_nested+0x16/0x20
       netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
       call_netdevice_register_net_notifiers+0x5a/0x100
       register_netdevice_notifier+0x85/0x120
       netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev]
       led_trigger_set+0x172/0x2c0
       ? preempt_count_add+0x49/0xc0
       led_trigger_write+0xf1/0x140
       sysfs_kf_bin_write+0x5d/0x80
       kernfs_fop_write_iter+0x15d/0x210
       vfs_write+0x1f0/0x510
       ksys_write+0x6c/0xf0
       __x64_sys_write+0x14/0x20
       do_syscall_64+0x3f/0xf0
       entry_SYSCALL_64_after_hwframe+0x6c/0x74
      RIP: 0033:0x7f269055d034
      Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 35 c3 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
      RSP: 002b:00007ffddb7ef748 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f269055d034
      RDX: 0000000000000007 RSI: 000055bf5f4af3c0 RDI: 0000000000000001
      RBP: 000055bf5f4af3c0 R08: 0000000000000073 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000007
      R13: 00007f26906325c0 R14: 00007f269062ff20 R15: 0000000000000000
       </TASK>
      
      Fixes: d5e01266
      
       ("leds: trigger: netdev: add additional specific link speed mode")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarLee Jones <lee@kernel.org>
      Link: https://lore.kernel.org/r/fb5c8294-2a10-4bf5-8f10-3d2b77d2757e@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c0adff9
    • Petr Pavlu's avatar
      tracing: Fix a possible race when disabling buffered events · 7d976464
      Petr Pavlu authored
      commit c0591b1c upstream.
      
      Function trace_buffered_event_disable() is responsible for freeing pages
      backing buffered events and this process can run concurrently with
      trace_event_buffer_lock_reserve().
      
      The following race is currently possible:
      
      * Function trace_buffered_event_disable() is called on CPU 0. It
        increments trace_buffered_event_cnt on each CPU and waits via
        synchronize_rcu() for each user of trace_buffered_event to complete.
      
      * After synchronize_rcu() is finished, function
        trace_buffered_event_disable() has the exclusive access to
        trace_buffered_event. All counters trace_buffered_event_cnt are at 1
        and all pointers trace_buffered_event are still valid.
      
      * At this point, on a different CPU 1, the execution reaches
        trace_event_buffer_lock_reserve(). The function calls
        preempt_disable_notrace() and only now enters an RCU read-side
        critical section. The function proceeds and reads a still valid
        pointer from trace_buffered_event[CPU1] into the local variable
        "entry". However, it doesn't yet read trace_buffered_event_cnt[CPU1]
        which happens later.
      
      * Function trace_buffered_event_disable() continues. It frees
        trace_buffered_event[CPU1] and decrements
        trace_buffered_event_cnt[CPU1] back to 0.
      
      * Function trace_event_buffer_lock_reserve() continues. It reads and
        increments trace_buffered_event_cnt[CPU1] from 0 to 1. This makes it
        believe that it can use the "entry" that it already obtained but the
        pointer is now invalid and any access results in a use-after-free.
      
      Fix the problem by making a second synchronize_rcu() call after all
      trace_buffered_event values are set to NULL. This waits on all potential
      users in trace_event_buffer_lock_reserve() that still read a previous
      pointer from trace_buffered_event.
      
      Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
      Link: https://lkml.kernel.org/r/20231205161736.19663-4-petr.pavlu@suse.com
      
      Cc: stable@vger.kernel.org
      Fixes: 0fc1b09f
      
       ("tracing: Use temp buffer when filtering events")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d976464
    • Petr Pavlu's avatar
      tracing: Fix incomplete locking when disabling buffered events · fc9fa702
      Petr Pavlu authored
      commit 7fed14f7 upstream.
      
      The following warning appears when using buffered events:
      
      [  203.556451] WARNING: CPU: 53 PID: 10220 at kernel/trace/ring_buffer.c:3912 ring_buffer_discard_commit+0x2eb/0x420
      [...]
      [  203.670690] CPU: 53 PID: 10220 Comm: stress-ng-sysin Tainted: G            E      6.7.0-rc2-default #4 56e6d0fcf5581e6e51eaaecbdaec2a2338c80f3a
      [  203.670704] Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
      [  203.670709] RIP: 0010:ring_buffer_discard_commit+0x2eb/0x420
      [  203.735721] Code: 4c 8b 4a 50 48 8b 42 48 49 39 c1 0f 84 b3 00 00 00 49 83 e8 01 75 b1 48 8b 42 10 f0 ff 40 08 0f 0b e9 fc fe ff ff f0 ff 47 08 <0f> 0b e9 77 fd ff ff 48 8b 42 10 f0 ff 40 08 0f 0b e9 f5 fe ff ff
      [  203.735734] RSP: 0018:ffffb4ae4f7b7d80 EFLAGS: 00010202
      [  203.735745] RAX: 0000000000000000 RBX: ffffb4ae4f7b7de0 RCX: ffff8ac10662c000
      [  203.735754] RDX: ffff8ac0c750be00 RSI: ffff8ac10662c000 RDI: ffff8ac0c004d400
      [  203.781832] RBP: ffff8ac0c039cea0 R08: 0000000000000000 R09: 0000000000000000
      [  203.781839] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [  203.781842] R13: ffff8ac10662c000 R14: ffff8ac0c004d400 R15: ffff8ac10662c008
      [  203.781846] FS:  00007f4cd8a67740(0000) GS:ffff8ad798880000(0000) knlGS:0000000000000000
      [  203.781851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  203.781855] CR2: 0000559766a74028 CR3: 00000001804c4000 CR4: 00000000001506f0
      [  203.781862] Call Trace:
      [  203.781870]  <TASK>
      [  203.851949]  trace_event_buffer_commit+0x1ea/0x250
      [  203.851967]  trace_event_raw_event_sys_enter+0x83/0xe0
      [  203.851983]  syscall_trace_enter.isra.0+0x182/0x1a0
      [  203.851990]  do_syscall_64+0x3a/0xe0
      [  203.852075]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [  203.852090] RIP: 0033:0x7f4cd870fa77
      [  203.982920] Code: 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 43 0e 00 f7 d8 64 89 01 48
      [  203.982932] RSP: 002b:00007fff99717dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
      [  203.982942] RAX: ffffffffffffffda RBX: 0000558ea1d7b6f0 RCX: 00007f4cd870fa77
      [  203.982948] RDX: 0000000000000000 RSI: 00007fff99717de0 RDI: 0000558ea1d7b6f0
      [  203.982957] RBP: 00007fff99717de0 R08: 00007fff997180e0 R09: 00007fff997180e0
      [  203.982962] R10: 00007fff997180e0 R11: 0000000000000246 R12: 00007fff99717f40
      [  204.049239] R13: 00007fff99718590 R14: 0000558e9f2127a8 R15: 00007fff997180b0
      [  204.049256]  </TASK>
      
      For instance, it can be triggered by running these two commands in
      parallel:
      
       $ while true; do
          echo hist:key=id.syscall:val=hitcount > \
            /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger;
        done
       $ stress-ng --sysinfo $(nproc)
      
      The warning indicates that the current ring_buffer_per_cpu is not in the
      committing state. It happens because the active ring_buffer_event
      doesn't actually come from the ring_buffer_per_cpu but is allocated from
      trace_buffered_event.
      
      The bug is in function trace_buffered_event_disable() where the
      following normally happens:
      
      * The code invokes disable_trace_buffered_event() via
        smp_call_function_many() and follows it by synchronize_rcu(). This
        increments the per-CPU variable trace_buffered_event_cnt on each
        target CPU and grants trace_buffered_event_disable() the exclusive
        access to the per-CPU variable trace_buffered_event.
      
      * Maintenance is performed on trace_buffered_event, all per-CPU event
        buffers get freed.
      
      * The code invokes enable_trace_buffered_event() via
        smp_call_function_many(). This decrements trace_buffered_event_cnt and
        releases the access to trace_buffered_event.
      
      A problem is that smp_call_function_many() runs a given function on all
      target CPUs except on the current one. The following can then occur:
      
      * Task X executing trace_buffered_event_disable() runs on CPU 0.
      
      * The control reaches synchronize_rcu() and the task gets rescheduled on
        another CPU 1.
      
      * The RCU synchronization finishes. At this point,
        trace_buffered_event_disable() has the exclusive access to all
        trace_buffered_event variables except trace_buffered_event[CPU0]
        because trace_buffered_event_cnt[CPU0] is never incremented and if the
        buffer is currently unused, remains set to 0.
      
      * A different task Y is scheduled on CPU 0 and hits a trace event. The
        code in trace_event_buffer_lock_reserve() sees that
        trace_buffered_event_cnt[CPU0] is set to 0 and decides the use the
        buffer provided by trace_buffered_event[CPU0].
      
      * Task X continues its execution in trace_buffered_event_disable(). The
        code incorrectly frees the event buffer pointed by
        trace_buffered_event[CPU0] and resets the variable to NULL.
      
      * Task Y writes event data to the now freed buffer and later detects the
        created inconsistency.
      
      The issue is observable since commit dea49978 ("tracing: Fix warning
      in trace_buffered_event_disable()") which moved the call of
      trace_buffered_event_disable() in __ftrace_event_enable_disable()
      earlier, prior to invoking call->class->reg(.. TRACE_REG_UNREGISTER ..).
      The underlying problem in trace_buffered_event_disable() is however
      present since the original implementation in commit 0fc1b09f
      ("tracing: Use temp buffer when filtering events").
      
      Fix the problem by replacing the two smp_call_function_many() calls with
      on_each_cpu_mask() which invokes a given callback on all CPUs.
      
      Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
      Link: https://lkml.kernel.org/r/20231205161736.19663-2-petr.pavlu@suse.com
      
      Cc: stable@vger.kernel.org
      Fixes: 0fc1b09f ("tracing: Use temp buffer when filtering events")
      Fixes: dea49978
      
       ("tracing: Fix warning in trace_buffered_event_disable()")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc9fa702
    • Steven Rostedt (Google)'s avatar
      tracing: Disable snapshot buffer when stopping instance tracers · 0486a1f9
      Steven Rostedt (Google) authored
      commit b538bf7d upstream.
      
      It use to be that only the top level instance had a snapshot buffer (for
      latency tracers like wakeup and irqsoff). When stopping a tracer in an
      instance would not disable the snapshot buffer. This could have some
      unintended consequences if the irqsoff tracer is enabled.
      
      Consolidate the tracing_start/stop() with tracing_start/stop_tr() so that
      all instances behave the same. The tracing_start/stop() functions will
      just call their respective tracing_start/stop_tr() with the global_array
      passed in.
      
      Link: https://lkml.kernel.org/r/20231205220011.041220035@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 6d9b3fa5
      
       ("tracing: Move tracing_max_latency into trace_array")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0486a1f9
    • Steven Rostedt (Google)'s avatar
      tracing: Stop current tracer when resizing buffer · 12c48e88
      Steven Rostedt (Google) authored
      commit d78ab792 upstream.
      
      When the ring buffer is being resized, it can cause side effects to the
      running tracer. For instance, there's a race with irqsoff tracer that
      swaps individual per cpu buffers between the main buffer and the snapshot
      buffer. The resize operation modifies the main buffer and then the
      snapshot buffer. If a swap happens in between those two operations it will
      break the tracer.
      
      Simply stop the running tracer before resizing the buffers and enable it
      again when finished.
      
      Link: https://lkml.kernel.org/r/20231205220010.748996423@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 3928a8a2
      
       ("ftrace: make work with new ring buffer")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12c48e88
    • Steven Rostedt (Google)'s avatar
      tracing: Always update snapshot buffer size · 1741e17c
      Steven Rostedt (Google) authored
      commit 7be76461 upstream.
      
      It use to be that only the top level instance had a snapshot buffer (for
      latency tracers like wakeup and irqsoff). The update of the ring buffer
      size would check if the instance was the top level and if so, it would
      also update the snapshot buffer as it needs to be the same as the main
      buffer.
      
      Now that lower level instances also has a snapshot buffer, they too need
      to update their snapshot buffer sizes when the main buffer is changed,
      otherwise the following can be triggered:
      
       # cd /sys/kernel/tracing
       # echo 1500 > buffer_size_kb
       # mkdir instances/foo
       # echo irqsoff > instances/foo/current_tracer
       # echo 1000 > instances/foo/buffer_size_kb
      
      Produces:
      
       WARNING: CPU: 2 PID: 856 at kernel/trace/trace.c:1938 update_max_tr_single.part.0+0x27d/0x320
      
      Which is:
      
      	ret = ring_buffer_swap_cpu(tr->max_buffer.buffer, tr->array_buffer.buffer, cpu);
      
      	if (ret == -EBUSY) {
      		[..]
      	}
      
      	WARN_ON_ONCE(ret && ret != -EAGAIN && ret != -EBUSY);  <== here
      
      That's because ring_buffer_swap_cpu() has:
      
      	int ret = -EINVAL;
      
      	[..]
      
      	/* At least make sure the two buffers are somewhat the same */
      	if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
      		goto out;
      
      	[..]
       out:
      	return ret;
       }
      
      Instead, update all instances' snapshot buffer sizes when their main
      buffer size is updated.
      
      Link: https://lkml.kernel.org/r/20231205220010.454662151@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 6d9b3fa5
      
       ("tracing: Move tracing_max_latency into trace_array")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1741e17c
    • Heiko Carstens's avatar
      checkstack: fix printed address · f8f32f91
      Heiko Carstens authored
      commit ee34db3f upstream.
      
      All addresses printed by checkstack have an extra incorrect 0 appended at
      the end.
      
      This was introduced with commit 677f1410 ("scripts/checkstack.pl: don't
      display $dre as different entity"): since then the address is taken from
      the line which contains the function name, instead of the line which
      contains stack consumption. E.g. on s390:
      
      0000000000100a30 <do_one_initcall>:
      ...
        100a44:       e3 f0 ff 70 ff 71       lay     %r15,-144(%r15)
      
      So the used regex which matches spaces and hexadecimal numbers to extract
      an address now matches a different substring. Subsequently replacing spaces
      with 0 appends a zero at the and, instead of replacing leading spaces.
      
      Fix this by using the proper regex, and simplify the code a bit.
      
      Link: https://lkml.kernel.org/r/20231120183719.2188479-2-hca@linux.ibm.com
      Fixes: 677f1410
      
       ("scripts/checkstack.pl: don't display $dre as different entity")
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Cc: Maninder Singh <maninder1.s@samsung.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Vaneet Narang <v.narang@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8f32f91
    • Tim Van Patten's avatar
      cgroup_freezer: cgroup_freezing: Check if not frozen · 9ec2d926
      Tim Van Patten authored
      commit cff5f49d
      
       upstream.
      
      __thaw_task() was recently updated to warn if the task being thawed was
      part of a freezer cgroup that is still currently freezing:
      
      	void __thaw_task(struct task_struct *p)
      	{
      	...
      		if (WARN_ON_ONCE(freezing(p)))
      			goto unlock;
      
      This has exposed a bug in cgroup1 freezing where when CGROUP_FROZEN is
      asserted, the CGROUP_FREEZING bits are not also cleared at the same
      time. Meaning, when a cgroup is marked FROZEN it continues to be marked
      FREEZING as well. This causes the WARNING to trigger, because
      cgroup_freezing() thinks the cgroup is still freezing.
      
      There are two ways to fix this:
      
      1. Whenever FROZEN is set, clear FREEZING for the cgroup and all
      children cgroups.
      2. Update cgroup_freezing() to also verify that FROZEN is not set.
      
      This patch implements option (2), since it's smaller and more
      straightforward.
      
      Signed-off-by: default avatarTim Van Patten <timvp@google.com>
      Tested-by: default avatarMark Hasemeyer <markhas@chromium.org>
      Fixes: f5d39b02
      
       ("freezer,sched: Rewrite core freezer logic")
      Cc: stable@vger.kernel.org # v6.1+
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ec2d926
    • Ming Lei's avatar
      lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly · 39f603a2
      Ming Lei authored
      commit 0263f92f upstream.
      
      group_cpus_evenly() could be part of storage driver's error handler, such
      as nvme driver, when may happen during CPU hotplug, in which storage queue
      has to drain its pending IOs because all CPUs associated with the queue
      are offline and the queue is becoming inactive.  And handling IO needs
      error handler to provide forward progress.
      
      Then deadlock is caused:
      
      1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's
         handler is waiting for inflight IO
      
      2) error handler is waiting for CPU hotplug lock
      
      3) inflight IO can't be completed in blk-mq's CPU hotplug handler
         because error handling can't provide forward progress.
      
      Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(),
      in which two stage spreads are taken: 1) the 1st stage is over all present
      CPUs; 2) the end stage is over all other CPUs.
      
      Turns out the two stage spread just needs consistent 'cpu_present_mask',
      and remove the CPU hotplug lock by storing it into one local cache.  This
      way doesn't change correctness, because all CPUs are still covered.
      
      Link: https://lkml.kernel.org/r/20231120083559.285174-1-ming.lei@redhat.com
      
      
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Reported-by: default avatarGuangwu Zhang <guazhang@redhat.com>
      Tested-by: default avatarGuangwu Zhang <guazhang@redhat.com>
      Reviewed-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Keith Busch <kbusch@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39f603a2
    • Ryusuke Konishi's avatar
      nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage() · 0b14276b
      Ryusuke Konishi authored
      commit 675abf8d upstream.
      
      If nilfs2 reads a disk image with corrupted segment usage metadata, and
      its segment usage information is marked as an error for the segment at the
      write location, nilfs_sufile_set_segment_usage() can trigger WARN_ONs
      during log writing.
      
      Segments newly allocated for writing with nilfs_sufile_alloc() will not
      have this error flag set, but this unexpected situation will occur if the
      segment indexed by either nilfs->ns_segnum or nilfs->ns_nextnum (active
      segment) was marked in error.
      
      Fix this issue by inserting a sanity check to treat it as a file system
      corruption.
      
      Since error returns are not allowed during the execution phase where
      nilfs_sufile_set_segment_usage() is used, this inserts the sanity check
      into nilfs_sufile_mark_dirty() which pre-reads the buffer containing the
      segment usage record to be updated and sets it up in a dirty state for
      writing.
      
      In addition, nilfs_sufile_set_segment_usage() is also called when
      canceling log writing and undoing segment usage update, so in order to
      avoid issuing the same kernel warning in that case, in case of
      cancellation, avoid checking the error flag in
      nilfs_sufile_set_segment_usage().
      
      Link: https://lkml.kernel.org/r/20231205085947.4431-1-konishi.ryusuke@gmail.com
      
      
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar <syzbot+14e9f834f6ddecece094@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=14e9f834f6ddecece094
      
      
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b14276b
    • Ryusuke Konishi's avatar
      nilfs2: fix missing error check for sb_set_blocksize call · ce58f141
      Ryusuke Konishi authored
      commit d61d0ab5 upstream.
      
      When mounting a filesystem image with a block size larger than the page
      size, nilfs2 repeatedly outputs long error messages with stack traces to
      the kernel log, such as the following:
      
       getblk(): invalid block size 8192 requested
       logical block size: 512
       ...
       Call Trace:
        dump_stack_lvl+0x92/0xd4
        dump_stack+0xd/0x10
        bdev_getblk+0x33a/0x354
        __breadahead+0x11/0x80
        nilfs_search_super_root+0xe2/0x704 [nilfs2]
        load_nilfs+0x72/0x504 [nilfs2]
        nilfs_mount+0x30f/0x518 [nilfs2]
        legacy_get_tree+0x1b/0x40
        vfs_get_tree+0x18/0xc4
        path_mount+0x786/0xa88
        __ia32_sys_mount+0x147/0x1a8
        __do_fast_syscall_32+0x56/0xc8
        do_fast_syscall_32+0x29/0x58
        do_SYSENTER_32+0x15/0x18
        entry_SYSENTER_32+0x98/0xf1
       ...
      
      This overloads the system logger.  And to make matters worse, it sometimes
      crashes the kernel with a memory access violation.
      
      This is because the return value of the sb_set_blocksize() call, which
      should be checked for errors, is not checked.
      
      The latter issue is due to out-of-buffer memory being accessed based on a
      large block size that caused sb_set_blocksize() to fail for buffers read
      with the initial minimum block size that remained unupdated in the
      super_block structure.
      
      Since nilfs2 mkfs tool does not accept block sizes larger than the system
      page size, this has been overlooked.  However, it is possible to create
      this situation by intentionally modifying the tool or by passing a
      filesystem image created on a system with a large page size to a system
      with a smaller page size and mounting it.
      
      Fix this issue by inserting the expected error handling for the call to
      sb_set_blocksize().
      
      Link: https://lkml.kernel.org/r/20231129141547.4726-1-konishi.ryusuke@gmail.com
      
      
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce58f141
    • Su Hui's avatar
      highmem: fix a memory copy problem in memcpy_from_folio · 1cdc934c
      Su Hui authored
      commit 73424d00 upstream.
      
      Clang static checker complains that value stored to 'from' is never read.
      And memcpy_from_folio() only copy the last chunk memory from folio to
      destination.  Use 'to += chunk' to replace 'from += chunk' to fix this
      typo problem.
      
      Link: https://lkml.kernel.org/r/20231130034017.1210429-1-suhui@nfschina.com
      Fixes: b23d03ef
      
       ("highmem: add memcpy_to_folio() and memcpy_from_folio()")
      Signed-off-by: default avatarSu Hui <suhui@nfschina.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cdc934c
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Force absolute timestamp on discard of event · 56a33431
      Steven Rostedt (Google) authored
      commit b2dd7975 upstream.
      
      There's a race where if an event is discarded from the ring buffer and an
      interrupt were to happen at that time and insert an event, the time stamp
      is still used from the discarded event as an offset. This can screw up the
      timings.
      
      If the event is going to be discarded, set the "before_stamp" to zero.
      When a new event comes in, it compares the "before_stamp" with the
      "write_stamp" and if they are not equal, it will insert an absolute
      timestamp. This will prevent the timings from getting out of sync due to
      the discarded event.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231206100244.5130f9b3@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: 6f6be606
      
       ("ring-buffer: Force before_stamp and write_stamp to be different on discard")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56a33431
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Test last update in 32bit version of __rb_time_read() · d251b981
      Steven Rostedt (Google) authored
      commit f458a145 upstream.
      
      Since 64 bit cmpxchg() is very expensive on 32bit architectures, the
      timestamp used by the ring buffer does some interesting tricks to be able
      to still have an atomic 64 bit number. It originally just used 60 bits and
      broke it up into two 32 bit words where the extra 2 bits were used for
      synchronization. But this was not enough for all use cases, and all 64
      bits were required.
      
      The 32bit version of the ring buffer timestamp was then broken up into 3
      32bit words using the same counter trick. But one update was not done. The
      check to see if the read operation was done without interruption only
      checked the first two words and not last one (like it had before this
      update). Fix it by making sure all three updates happen without
      interruption by comparing the initial counter with the last updated
      counter.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231206100050.3100b7bb@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: f03f2abc
      
       ("ring-buffer: Have 32 bit time stamps use all 64 bits")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d251b981
    • Takashi Iwai's avatar
      ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 · 73249ef7
      Takashi Iwai authored
      commit 634e5e1e upstream.
      
      Lenovo Yoga Pro 7 14APH8 (PCI SSID 17aa:3882) seems requiring the
      similar workaround like Yoga 9 model for the bass speaker.
      
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/CAGGk=CRRQ1L9p771HsXTN_ebZP41Qj+3gw35Gezurn+nokRewg@mail.gmail.com
      Link: https://lore.kernel.org/r/20231207182035.30248-1-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73249ef7
    • Mario Limonciello's avatar
      ALSA: hda/realtek: Add Framework laptop 16 to quirks · 5f1c1e8d
      Mario Limonciello authored
      commit 8804fa04
      
       upstream.
      
      The Framework 16" laptop has the same controller as other Framework
      models.  Apply the presence detection quirk.
      
      Signed-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231206193927.2996-1-mario.limonciello@amd.com
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f1c1e8d
    • Tim Bosse's avatar
      ALSA: hda/realtek: add new Framework laptop to quirks · 70a68855
      Tim Bosse authored
      commit 33038efb upstream.
      
      The Framework Laptop 13 (AMD Ryzen 7040Series) has an ALC295 with
      a disconnected or faulty headset mic presence detect similar to the
      previous models.  It works with the same quirk chain as
      309d7363
      
      .  This model has a VID:PID
      of f111:0006.
      
      Signed-off-by: default avatarTim Bosse <flinn@timbos.se>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231206142629.388615-1-flinn@timbos.se
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70a68855
    • Bin Li's avatar
      ALSA: hda/realtek: Enable headset on Lenovo M90 Gen5 · 65a7a5b2
      Bin Li authored
      commit 6f7e4664
      
       upstream.
      
      Lenovo M90 Gen5 is equipped with ALC897, and it needs
      ALC897_FIXUP_HEADSET_MIC_PIN quirk to make its headset mic work.
      
      Signed-off-by: default avatarBin Li <bin.li@canonical.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231204100450.642783-1-bin.li@canonical.com
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65a7a5b2
    • Aleksandrs Vinarskis's avatar
      ALSA: hda/realtek: fix speakers on XPS 9530 (2023) · b3f1d923
      Aleksandrs Vinarskis authored
      commit cd14dedf
      
       upstream.
      
      XPS 9530 has 2 tweeters and 2 subwoofers powered by CS35L41 amplifier, SPI
      connected. For subwoofers to work, it requires both to enable amplifier
      support, and to enable output to subwoofers via 0x17 quirk (similalry to
      XPS 9510/9520).
      
      Signed-off-by: default avatarAleksandrs Vinarskis <alex.vinarskis@gmail.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231203233006.100558-1-alex.vinarskis@gmail.com
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3f1d923
    • Pascal Noël's avatar
      ALSA: hda/realtek: Apply quirk for ASUS UM3504DA · 6e25980d
      Pascal Noël authored
      commit c5c325bb
      
       upstream.
      
      The ASUS UM3504DA uses a Realtek HDA codec and two CS35L41 amplifiers via I2C.
      Apply existing quirk to model.
      
      Signed-off-by: default avatarPascal Noël <pascal@pascalcompiles.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231202013744.12369-1-pascal@pascalcompiles.com
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e25980d
    • Jason Zhang's avatar
      ALSA: pcm: fix out-of-bounds in snd_pcm_state_names · 8e6ac8c6
      Jason Zhang authored
      commit 2b3a7a30
      
       upstream.
      
      The pcm state can be SNDRV_PCM_STATE_DISCONNECTED at disconnect
      callback, and there is not an entry of SNDRV_PCM_STATE_DISCONNECTED
      in snd_pcm_state_names.
      
      This patch adds the missing entry to resolve this issue.
      
      cat /proc/asound/card2/pcm0p/sub0/status
      That results in stack traces like the following:
      
      [   99.702732][ T5171] Unexpected kernel BRK exception at EL1
      [   99.702774][ T5171] Internal error: BRK handler: f2005512 [#1] PREEMPT SMP
      [   99.703858][ T5171] Modules linked in: bcmdhd(E) (...)
      [   99.747425][ T5171] CPU: 3 PID: 5171 Comm: cat Tainted: G         C OE     5.10.189-android13-4-00003-g4a17384380d8-ab11086999 #1
      [   99.748447][ T5171] Hardware name: Rockchip RK3588 CVTE V10 Board (DT)
      [   99.749024][ T5171] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
      [   99.749616][ T5171] pc : snd_pcm_substream_proc_status_read+0x264/0x2bc
      [   99.750204][ T5171] lr : snd_pcm_substream_proc_status_read+0xa4/0x2bc
      [   99.750778][ T5171] sp : ffffffc0175abae0
      [   99.751132][ T5171] x29: ffffffc0175abb80 x28: ffffffc009a2c498
      [   99.751665][ T5171] x27: 0000000000000001 x26: ffffff810cbae6e8
      [   99.752199][ T5171] x25: 0000000000400cc0 x24: ffffffc0175abc60
      [   99.752729][ T5171] x23: 0000000000000000 x22: ffffff802f558400
      [   99.753263][ T5171] x21: ffffff81d8d8ff00 x20: ffffff81020cdc00
      [   99.753795][ T5171] x19: ffffff802d110000 x18: ffffffc014fbd058
      [   99.754326][ T5171] x17: 0000000000000000 x16: 0000000000000000
      [   99.754861][ T5171] x15: 000000000000c276 x14: ffffffff9a976fda
      [   99.755392][ T5171] x13: 0000000065689089 x12: 000000000000d72e
      [   99.755923][ T5171] x11: ffffff802d110000 x10: 00000000000000e0
      [   99.756457][ T5171] x9 : 9c431600c8385d00 x8 : 0000000000000008
      [   99.756990][ T5171] x7 : 0000000000000000 x6 : 000000000000003f
      [   99.757522][ T5171] x5 : 0000000000000040 x4 : ffffffc0175abb70
      [   99.758056][ T5171] x3 : 0000000000000001 x2 : 0000000000000001
      [   99.758588][ T5171] x1 : 0000000000000000 x0 : 0000000000000000
      [   99.759123][ T5171] Call trace:
      [   99.759404][ T5171]  snd_pcm_substream_proc_status_read+0x264/0x2bc
      [   99.759958][ T5171]  snd_info_seq_show+0x54/0xa4
      [   99.760370][ T5171]  seq_read_iter+0x19c/0x7d4
      [   99.760770][ T5171]  seq_read+0xf0/0x128
      [   99.761117][ T5171]  proc_reg_read+0x100/0x1f8
      [   99.761515][ T5171]  vfs_read+0xf4/0x354
      [   99.761869][ T5171]  ksys_read+0x7c/0x148
      [   99.762226][ T5171]  __arm64_sys_read+0x20/0x30
      [   99.762625][ T5171]  el0_svc_common+0xd0/0x1e4
      [   99.763023][ T5171]  el0_svc+0x28/0x98
      [   99.763358][ T5171]  el0_sync_handler+0x8c/0xf0
      [   99.763759][ T5171]  el0_sync+0x1b8/0x1c0
      [   99.764118][ T5171] Code: d65f03c0 b9406102 17ffffae 94191565 (d42aa240)
      [   99.764715][ T5171] ---[ end trace 1eeffa3e17c58e10 ]---
      [   99.780720][ T5171] Kernel panic - not syncing: BRK handler: Fatal exception
      
      Signed-off-by: default avatarJason Zhang <jason.zhang@rock-chips.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231206013139.20506-1-jason.zhang@rock-chips.com
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e6ac8c6
    • Sarah Grant's avatar
      ALSA: usb-audio: Add Pioneer DJM-450 mixer controls · 5ae225bb
      Sarah Grant authored
      commit bbb8e719
      
       upstream.
      
      These values mirror those of the Pioneer DJM-250MK2 as the channel layout
      appears identical based on my observations. This duplication could be removed in
      later contributions if desired.
      
      Signed-off-by: default avatarSarah Grant <s@srd.tw>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231201181654.5058-1-s@srd.tw
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ae225bb
    • Pavel Begunkov's avatar
      io_uring: fix mutex_unlock with unreferenced ctx · 30df2901
      Pavel Begunkov authored
      commit f7b32e78 upstream.
      
      Callers of mutex_unlock() have to make sure that the mutex stays alive
      for the whole duration of the function call. For io_uring that means
      that the following pattern is not valid unless we ensure that the
      context outlives the mutex_unlock() call.
      
      mutex_lock(&ctx->uring_lock);
      req_put(req); // typically via io_req_task_submit()
      mutex_unlock(&ctx->uring_lock);
      
      Most contexts are fine: io-wq pins requests, syscalls hold the file,
      task works are taking ctx references and so on. However, the task work
      fallback path doesn't follow the rule.
      
      Cc:  <stable@vger.kernel.org>
      Fixes: 04fc6c80
      
       ("io_uring: save ctx put/get for task_work submit")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/io-uring/CAG48ez3xSoYb+45f1RLtktROJrpiDQ1otNvdR+YLQf7m+Krj5Q@mail.gmail.com/
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30df2901
    • Georg Gottleuber's avatar
      nvme-pci: Add sleep quirk for Kingston drives · dd864f6e
      Georg Gottleuber authored
      commit 107b4e06
      
       upstream.
      
      Some Kingston NV1 and A2000 are wasting a lot of power on specific TUXEDO
      platforms in s2idle sleep if 'Simple Suspend' is used.
      
      This patch applies a new quirk 'Force No Simple Suspend' to achieve a
      low power sleep without 'Simple Suspend'.
      
      Signed-off-by: default avatarWerner Sembach <wse@tuxedocomputers.com>
      Signed-off-by: default avatarGeorg Gottleuber <ggo@tuxedocomputers.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd864f6e
    • Pavel Begunkov's avatar
      io_uring/af_unix: disable sending io_uring over sockets · 5a33d385
      Pavel Begunkov authored
      commit 705318a9 upstream.
      
      File reference cycles have caused lots of problems for io_uring
      in the past, and it still doesn't work exactly right and races with
      unix_stream_read_generic(). The safest fix would be to completely
      disallow sending io_uring files via sockets via SCM_RIGHT, so there
      are no possible cycles invloving registered files and thus rendering
      SCM accounting on the io_uring side unnecessary.
      
      Cc:  <stable@vger.kernel.org>
      Fixes: 0091bfc8
      
       ("io_uring/af_unix: defer registered files gc to io_uring release")
      Reported-and-suggested-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/c716c88321939156909cfa1bd8b0faaf1c804103.1701868795.git.asml.silence@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a33d385
    • Malcolm Hart's avatar
      ASoC: amd: yc: Fix non-functional mic on ASUS E1504FA · 127fcf79
      Malcolm Hart authored
      commit b24e3590
      
       upstream.
      
      This patch adds ASUSTeK COMPUTER INC  "E1504FA" to the quirks file acp6x-mach.c
      to enable microphone array on ASUS Vivobook GO 15.
      I have this laptop and can confirm that the patch succeeds in enabling the
      microphone array.
      
      Signed-off-by: default avatarMalcolm Hart <malcolm@5harts.com>
      Cc: stable@vger.kernel.org
      Rule: add
      Link: https://lore.kernel.org/stable/875y1nt1bx.fsf%405harts.com
      Link: https://lore.kernel.org/r/871qcbszh0.fsf@5harts.com
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      127fcf79
    • Masami Hiramatsu (Google)'s avatar
      rethook: Use __rcu pointer for rethook::handler · 29b9ebc8
      Masami Hiramatsu (Google) authored
      commit a1461f1f upstream.
      
      Since the rethook::handler is an RCU-maganged pointer so that it will
      notice readers the rethook is stopped (unregistered) or not, it should
      be an __rcu pointer and use appropriate functions to be accessed. This
      will use appropriate memory barrier when accessing it. OTOH,
      rethook::data is never changed, so we don't need to check it in
      get_kretprobe().
      
      NOTE: To avoid sparse warning, rethook::handler is defined by a raw
      function pointer type with __rcu instead of rethook_handler_t.
      
      Link: https://lore.kernel.org/all/170126066201.398836.837498688669005979.stgit@devnote2/
      
      Fixes: 54ecbe6f
      
       ("rethook: Add a generic return hook")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202311241808.rv9ceuAh-lkp@intel.com/
      
      
      Tested-by: default avatarJP Kobryn <inwardvessel@gmail.com>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29b9ebc8
    • Florian Fainelli's avatar
      scripts/gdb: fix lx-device-list-bus and lx-device-list-class · af448bb2
      Florian Fainelli authored
      [ Upstream commit 801a2b1b ]
      
      After the conversion to bus_to_subsys() and class_to_subsys(), the gdb
      scripts listing the system buses and classes respectively was broken, fix
      those by returning the subsys_priv pointer and have the various caller
      de-reference either the 'bus' or 'class' structure members accordingly.
      
      Link: https://lkml.kernel.org/r/20231130043317.174188-1-florian.fainelli@broadcom.com
      Fixes: 7b884b7f
      
       ("driver core: class.c: convert to only use class_to_subsys")
      Signed-off-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Tested-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      af448bb2
    • Baoquan He's avatar
      kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP · 2d16a9f7
      Baoquan He authored
      [ Upstream commit dccf78d3 ]
      
      Ignat Korchagin complained that a potential config regression was
      introduced by commit 89cde455 ("kexec: consolidate kexec and crash
      options into kernel/Kconfig.kexec").  Before the commit, CONFIG_CRASH_DUMP
      has no dependency on CONFIG_KEXEC.  After the commit, CRASH_DUMP selects
      KEXEC.  That enforces system to have CONFIG_KEXEC=y as long as
      CONFIG_CRASH_DUMP=Y which people may not want.
      
      In Ignat's case, he sets CONFIG_CRASH_DUMP=y, CONFIG_KEXEC_FILE=y and
      CONFIG_KEXEC=n because kexec_load interface could have security issue if
      kernel/initrd has no chance to be signed and verified.
      
      CRASH_DUMP has select of KEXEC because Eric, author of above commit, met a
      LKP report of build failure when posting patch of earlier version.  Please
      see below link to get detail of the LKP report:
      
          https://lore.kernel.org/all/3e8eecd1-a277-2cfb-690e-5de2eb7b988e@oracle.com/T/#u
      
      In fact, that LKP report is triggered because arm's <asm/kexec.h> is
      wrapped in CONFIG_KEXEC ifdeffery scope.  That is wrong.  CONFIG_KEXEC
      controls the enabling/disabling of kexec_load interface, but not kexec
      feature.  Removing the wrongly added CONFIG_KEXEC ifdeffery scope in
      <asm/kexec.h> of arm allows us to drop the select KEXEC for CRASH_DUMP.
      Meanwhile, change arch/arm/kernel/Makefile to let machine_kexec.o
      relocate_kernel.o depend on KEXEC_CORE.
      
      Link: https://lkml.kernel.org/r/20231128054457.659452-1-bhe@redhat.com
      Fixes: 89cde455
      
       ("kexec: consolidate kexec and crash options into kernel/Kconfig.kexec")
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Reported-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Tested-by: Ignat Korchagin <ignat@cloudflare.com>	[compile-time only]
      Tested-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Reviewed-by: default avatarEric DeVolder <eric_devolder@yahoo.com>
      Tested-by: default avatarEric DeVolder <eric_devolder@yahoo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2d16a9f7
    • Yu Kuai's avatar
      md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly() · 49b79af0
      Yu Kuai authored
      [ Upstream commit c9f7cb5b ]
      
      If md_set_readonly() failed, the array could still be read-write, however
      'MD_RECOVERY_FROZEN' could still be set, which leave the array in an
      abnormal state that sync or recovery can't continue anymore.
      Hence make sure the flag is cleared after md_set_readonly() returns.
      
      Fixes: 88724bfa
      
       ("md: wait for pending superblock updates before switching to read-only")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Acked-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20231205094215.1824240-3-yukuai1@huaweicloud.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      49b79af0
    • Lad Prabhakar's avatar
      riscv: errata: andes: Probe for IOCP only once in boot stage · 7442310e
      Lad Prabhakar authored
      [ Upstream commit ed5b7cfd ]
      
      We need to probe for IOCP only once during boot stage, as we were probing
      for IOCP for all the stages this caused the below issue during module-init
      stage,
      
      [9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
      [9.027153] Oops [#1]
      [9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
      [9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ #57
      [9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
      [9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
      [9.055558]  ra : andes_errata_patch_func+0x4a/0x52
      [9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
      [9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
      [9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
      [9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
      [9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
      [9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
      [9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
      [9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
      [9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
      [9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
      [9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
      [9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
      [9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
      [9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
      [9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
      [9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
      [9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
      [9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
      [9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
      [9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
      [9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
      [9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
      [9.205994] ---[ end trace 0000000000000000 ]---
      
      This is because we called riscv_noncoherent_supported() for all the stages
      during IOCP probe. riscv_noncoherent_supported() function sets
      noncoherent_supported variable to true which has an annotation set to
      "__ro_after_init" due to which we were seeing the above splat. Fix this by
      probing for IOCP only once in boot stage by having a boolean variable
      "done" which will be set to true upon IOCP probe in errata_probe_iocp()
      and we bail out early if "done" is set to true.
      
      While at it make return type of errata_probe_iocp() to void as we were
      not checking the return value in andes_errata_patch_func().
      
      Fixes: e021ae7f
      
       ("riscv: errata: Add Andes alternative ports")
      Signed-off-by: default avatarLad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarYu Chien Peter Lin <peterlin@andestech.com>
      Link: https://lore.kernel.org/r/20231130212647.108746-1-prabhakar.mahadev-lad.rj@bp.renesas.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7442310e