Skip to content
  1. Dec 08, 2022
    • Ido Schimmel's avatar
      ipv4: Fix route deletion when nexthop info is not specified · 041f8dc8
      Ido Schimmel authored
      [ Upstream commit d5082d38
      
       ]
      
      When the kernel receives a route deletion request from user space it
      tries to delete a route that matches the route attributes specified in
      the request.
      
      If only prefix information is specified in the request, the kernel
      should delete the first matching FIB alias regardless of its associated
      FIB info. However, an error is currently returned when the FIB info is
      backed by a nexthop object:
      
       # ip nexthop add id 1 via 192.0.2.2 dev dummy10
       # ip route add 198.51.100.0/24 nhid 1
       # ip route del 198.51.100.0/24
       RTNETLINK answers: No such process
      
      Fix by matching on such a FIB info when legacy nexthop attributes are
      not specified in the request. An earlier check already covers the case
      where a nexthop ID is specified in the request.
      
      Add tests that cover these flows. Before the fix:
      
       # ./fib_nexthops.sh -t ipv4_fcnal
       ...
       TEST: Delete route when not specifying nexthop attributes           [FAIL]
      
       Tests passed:  11
       Tests failed:   1
      
      After the fix:
      
       # ./fib_nexthops.sh -t ipv4_fcnal
       ...
       TEST: Delete route when not specifying nexthop attributes           [ OK ]
      
       Tests passed:  12
       Tests failed:   0
      
      No regressions in other tests:
      
       # ./fib_nexthops.sh
       ...
       Tests passed: 228
       Tests failed:   0
      
       # ./fib_tests.sh
       ...
       Tests passed: 186
       Tests failed:   0
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Tested-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Fixes: 493ced1a ("ipv4: Allow routes to use nexthop objects")
      Fixes: 6bf92d70 ("net: ipv4: fix route with nexthop object delete warning")
      Fixes: 61b91eb3
      
       ("ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20221124210932.2470010-1-idosch@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      041f8dc8
    • David Ahern's avatar
      ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference · 25174d91
      David Ahern authored
      [ Upstream commit 61b91eb3 ]
      
      Gwangun Jung reported a slab-out-of-bounds access in fib_nh_match:
          fib_nh_match+0xf98/0x1130 linux-6.0-rc7/net/ipv4/fib_semantics.c:961
          fib_table_delete+0x5f3/0xa40 linux-6.0-rc7/net/ipv4/fib_trie.c:1753
          inet_rtm_delroute+0x2b3/0x380 linux-6.0-rc7/net/ipv4/fib_frontend.c:874
      
      Separate nexthop objects are mutually exclusive with the legacy
      multipath spec. Fix fib_nh_match to return if the config for the
      to be deleted route contains a multipath spec while the fib_info
      is using a nexthop object.
      
      Fixes: 493ced1a ("ipv4: Allow routes to use nexthop objects")
      Fixes: 6bf92d70
      
       ("net: ipv4: fix route with nexthop object delete warning")
      Reported-by: default avatarGwangun Jung <exsociety@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: d5082d38
      
       ("ipv4: Fix route deletion when nexthop info is not specified")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      25174d91
    • Nikolay Aleksandrov's avatar
      selftests: net: fix nexthop warning cleanup double ip typo · a0ad247e
      Nikolay Aleksandrov authored
      [ Upstream commit 692930cc ]
      
      I made a stupid typo when adding the nexthop route warning selftest and
      added both $IP and ip after it (double ip) on the cleanup path. The
      error doesn't show up when running the test, but obviously it doesn't
      cleanup properly after it.
      
      Fixes: 392baa33
      
       ("selftests: net: add delete nexthop route warning test")
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: d5082d38
      
       ("ipv4: Fix route deletion when nexthop info is not specified")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a0ad247e
    • Nikolay Aleksandrov's avatar
      selftests: net: add delete nexthop route warning test · 532847b6
      Nikolay Aleksandrov authored
      [ Upstream commit 392baa33
      
       ]
      
      Add a test which causes a WARNING on kernels which treat a
      nexthop route like a normal route when comparing for deletion and a
      device is specified. That is, a route is found but we hit a warning while
      matching it. The warning is from fib_info_nh() in include/net/nexthop.h
      because we run it on a fib_info with nexthop object. The call chain is:
       inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
      nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
      the fib_info and triggering the warning).
      
      Repro steps:
       $ ip nexthop add id 12 via 172.16.1.3 dev veth1
       $ ip route add 172.16.101.1/32 nhid 12
       $ ip route delete 172.16.101.1/32 dev veth1
      
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: d5082d38
      
       ("ipv4: Fix route deletion when nexthop info is not specified")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      532847b6
    • Lee Jones's avatar
      Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled · e0783558
      Lee Jones authored
      [ Upstream commit 152fe65f
      
       ]
      
      When enabled, KASAN enlarges function's stack-frames.  Pushing quite a few
      over the current threshold.  This can mainly be seen on 32-bit
      architectures where the present limit (when !GCC) is a lowly 1024-Bytes.
      
      Link: https://lkml.kernel.org/r/20221125120750.3537134-3-lee@kernel.org
      Signed-off-by: default avatarLee Jones <lee@kernel.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "Christian König" <christian.koenig@amd.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@gmail.com>
      Cc: Harry Wentland <harry.wentland@amd.com>
      Cc: Leo Li <sunpeng.li@amd.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
      Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Tom Rix <trix@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e0783558
    • Helge Deller's avatar
      parisc: Increase FRAME_WARN to 2048 bytes on parisc · 723fa02e
      Helge Deller authored
      [ Upstream commit 8d192bec
      
       ]
      
      PA-RISC uses a much bigger frame size for functions than other
      architectures. So increase it to 2048 for 32- and 64-bit kernels.
      This fixes e.g. a warning in lib/xxhash.c.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Stable-dep-of: 152fe65f
      
       ("Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      723fa02e
    • Gavin Shan's avatar
      mm: migrate: fix THP's mapcount on isolation · b951ab4b
      Gavin Shan authored
      [ Upstream commit 829ae0f8 ]
      
      The issue is reported when removing memory through virtio_mem device.  The
      transparent huge page, experienced copy-on-write fault, is wrongly
      regarded as pinned.  The transparent huge page is escaped from being
      isolated in isolate_migratepages_block().  The transparent huge page can't
      be migrated and the corresponding memory block can't be put into offline
      state.
      
      Fix it by replacing page_mapcount() with total_mapcount().  With this, the
      transparent huge page can be isolated and migrated, and the memory block
      can be put into offline state.  Besides, The page's refcount is increased
      a bit earlier to avoid the page is released when the check is executed.
      
      Link: https://lkml.kernel.org/r/20221124095523.31061-1-gshan@redhat.com
      Fixes: 1da2f328
      
       ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Reported-by: default avatarZhenyu Zhang <zhenyzha@redhat.com>
      Tested-by: default avatarZhenyu Zhang <zhenyzha@redhat.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>	[5.7+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b951ab4b
    • Hugh Dickins's avatar
      mm: __isolate_lru_page_prepare() in isolate_migratepages_block() · c5eda602
      Hugh Dickins authored
      [ Upstream commit 89f6c88a
      
       ]
      
      __isolate_lru_page_prepare() conflates two unrelated functions, with the
      flags to one disjoint from the flags to the other; and hides some of the
      important checks outside of isolate_migratepages_block(), where the
      sequence is better to be visible.  It comes from the days of lumpy
      reclaim, before compaction, when the combination made more sense.
      
      Move what's needed by mm/compaction.c isolate_migratepages_block() inline
      there, and what's needed by mm/vmscan.c isolate_lru_pages() inline there.
      
      Shorten "isolate_mode" to "mode", so the sequence of conditions is easier
      to read.  Declare a "mapping" variable, to save one call to page_mapping()
      (but not another: calling again after page is locked is necessary).
      Simplify isolate_lru_pages() with a "move_to" list pointer.
      
      Link: https://lkml.kernel.org/r/879d62a8-91cc-d3c6-fb3b-69768236df68@google.com
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarAlex Shi <alexs@kernel.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Stable-dep-of: 829ae0f8
      
       ("mm: migrate: fix THP's mapcount on isolation")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c5eda602
    • Xiongfeng Wang's avatar
      iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init() · bdb613ef
      Xiongfeng Wang authored
      [ Upstream commit 4bedbbd7 ]
      
      for_each_pci_dev() is implemented by pci_get_device(). The comment of
      pci_get_device() says that it will increase the reference count for the
      returned pci_dev and also decrease the reference count for the input
      pci_dev @from if it is not NULL.
      
      If we break for_each_pci_dev() loop with pdev not NULL, we need to call
      pci_dev_put() to decrease the reference count. Add the missing
      pci_dev_put() for the error path to avoid reference count leak.
      
      Fixes: 2e455289
      
       ("iommu/vt-d: Unify the way to process DMAR device scope array")
      Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
      Link: https://lore.kernel.org/r/20221121113649.190393-3-wangxiongfeng2@huawei.com
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bdb613ef
    • Xiongfeng Wang's avatar
      iommu/vt-d: Fix PCI device refcount leak in has_external_pci() · b6eea8b2
      Xiongfeng Wang authored
      [ Upstream commit afca9e19 ]
      
      for_each_pci_dev() is implemented by pci_get_device(). The comment of
      pci_get_device() says that it will increase the reference count for the
      returned pci_dev and also decrease the reference count for the input
      pci_dev @from if it is not NULL.
      
      If we break for_each_pci_dev() loop with pdev not NULL, we need to call
      pci_dev_put() to decrease the reference count. Add the missing
      pci_dev_put() before 'return true' to avoid reference count leak.
      
      Fixes: 89a6079d
      
       ("iommu/vt-d: Force IOMMU on for platform opt in hint")
      Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
      Link: https://lore.kernel.org/r/20221121113649.190393-2-wangxiongfeng2@huawei.com
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b6eea8b2
    • Caleb Sander's avatar
      nvme: fix SRCU protection of nvme_ns_head list · 787d81d4
      Caleb Sander authored
      [ Upstream commit 899d2a05 ]
      
      Walking the nvme_ns_head siblings list is protected by the head's srcu
      in nvme_ns_head_submit_bio() but not nvme_mpath_revalidate_paths().
      Removing namespaces from the list also fails to synchronize the srcu.
      Concurrent scan work can therefore cause use-after-frees.
      
      Hold the head's srcu lock in nvme_mpath_revalidate_paths() and
      synchronize with the srcu, not the global RCU, in nvme_ns_remove().
      
      Observed the following panic when making NVMe/RDMA connections
      with native multipath on the Rocky Linux 8.6 kernel
      (it seems the upstream kernel has the same race condition).
      Disassembly shows the faulting instruction is cmp 0x50(%rdx),%rcx;
      computing capacity != get_capacity(ns->disk).
      Address 0x50 is dereferenced because ns->disk is NULL.
      The NULL disk appears to be the result of concurrent scan work
      freeing the namespace (note the log line in the middle of the panic).
      
      [37314.206036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [37314.206036] nvme0n3: detected capacity change from 0 to 11811160064
      [37314.299753] PGD 0 P4D 0
      [37314.299756] Oops: 0000 [#1] SMP PTI
      [37314.299759] CPU: 29 PID: 322046 Comm: kworker/u98:3 Kdump: loaded Tainted: G        W      X --------- -  - 4.18.0-372.32.1.el8test86.x86_64 #1
      [37314.299762] Hardware name: Dell Inc. PowerEdge R720/0JP31P, BIOS 2.7.0 05/23/2018
      [37314.299763] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [37314.299783] RIP: 0010:nvme_mpath_revalidate_paths+0x26/0xb0 [nvme_core]
      [37314.299790] Code: 1f 44 00 00 66 66 66 66 90 55 53 48 8b 5f 50 48 8b 83 c8 c9 00 00 48 8b 13 48 8b 48 50 48 39 d3 74 20 48 8d 42 d0 48 8b 50 20 <48> 3b 4a 50 74 05 f0 80 60 70 ef 48 8b 50 30 48 8d 42 d0 48 39 d3
      [37315.058803] RSP: 0018:ffffabe28f913d10 EFLAGS: 00010202
      [37315.121316] RAX: ffff927a077da800 RBX: ffff92991dd70000 RCX: 0000000001600000
      [37315.206704] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff92991b719800
      [37315.292106] RBP: ffff929a6b70c000 R08: 000000010234cd4a R09: c0000000ffff7fff
      [37315.377501] R10: 0000000000000001 R11: ffffabe28f913a30 R12: 0000000000000000
      [37315.462889] R13: ffff92992716600c R14: ffff929964e6e030 R15: ffff92991dd70000
      [37315.548286] FS:  0000000000000000(0000) GS:ffff92b87fb80000(0000) knlGS:0000000000000000
      [37315.645111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [37315.713871] CR2: 0000000000000050 CR3: 0000002208810006 CR4: 00000000000606e0
      [37315.799267] Call Trace:
      [37315.828515]  nvme_update_ns_info+0x1ac/0x250 [nvme_core]
      [37315.892075]  nvme_validate_or_alloc_ns+0x2ff/0xa00 [nvme_core]
      [37315.961871]  ? __blk_mq_free_request+0x6b/0x90
      [37316.015021]  nvme_scan_work+0x151/0x240 [nvme_core]
      [37316.073371]  process_one_work+0x1a7/0x360
      [37316.121318]  ? create_worker+0x1a0/0x1a0
      [37316.168227]  worker_thread+0x30/0x390
      [37316.212024]  ? create_worker+0x1a0/0x1a0
      [37316.258939]  kthread+0x10a/0x120
      [37316.297557]  ? set_kthread_struct+0x50/0x50
      [37316.347590]  ret_from_fork+0x35/0x40
      [37316.390360] Modules linked in: nvme_rdma nvme_tcp(X) nvme_fabrics nvme_core netconsole iscsi_tcp libiscsi_tcp dm_queue_length dm_service_time nf_conntrack_netlink br_netfilter bridge stp llc overlay nft_chain_nat ipt_MASQUERADE nf_nat xt_addrtype xt_CT nft_counter xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_multiport nft_compat nf_tables libcrc32c nfnetlink dm_multipath tg3 rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm intel_rapl_msr iTCO_wdt iTCO_vendor_support dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul mlx5_ib ghash_clmulni_intel ib_uverbs rapl intel_cstate intel_uncore ib_core ipmi_si joydev mei_me pcspkr ipmi_devintf mei lpc_ich wmi ipmi_msghandler acpi_power_meter ext4 mbcache jbd2 sd_mod t10_pi sg mgag200 mlx5_core drm_kms_helper syscopyarea
      [37316.390419]  sysfillrect ahci sysimgblt fb_sys_fops libahci drm crc32c_intel libata mlxfw pci_hyperv_intf tls i2c_algo_bit psample dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: nvme_core]
      [37317.645908] CR2: 0000000000000050
      
      Fixes: e7d65803
      
       ("nvme-multipath: revalidate paths during rescan")
      Signed-off-by: default avatarCaleb Sander <csander@purestorage.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      787d81d4
    • Guo Ren's avatar
      riscv: kexec: Fixup irq controller broken in kexec crash path · 12f23720
      Guo Ren authored
      [ Upstream commit b17d19a5 ]
      
      If a crash happens on cpu3 and all interrupts are binding on cpu0, the
      bad irq routing will cause a crash kernel which can't receive any irq.
      Because crash kernel won't clean up all harts' PLIC enable bits in
      enable registers. This patch is similar to 9141a003 ("ARM: 7316/1:
      kexec: EOI active and mask all interrupts in kexec crash path") and
      78fd584c ("arm64: kdump: implement machine_crash_shutdown()"), and
      PowerPC also has the same mechanism.
      
      Fixes: fba8a867
      
       ("RISC-V: Add kexec support")
      Signed-off-by: default avatarGuo Ren <guoren@linux.alibaba.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      Reviewed-by: default avatarXianting Tian <xianting.tian@linux.alibaba.com>
      Cc: Nick Kossifidis <mick@ics.forth.gr>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Link: https://lore.kernel.org/r/20221020141603.2856206-2-guoren@kernel.org
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      12f23720
    • Jisheng Zhang's avatar
      riscv: fix race when vmap stack overflow · ac00301a
      Jisheng Zhang authored
      [ Upstream commit 7e186433 ]
      
      Currently, when detecting vmap stack overflow, riscv firstly switches
      to the so called shadow stack, then use this shadow stack to call the
      get_overflow_stack() to get the overflow stack. However, there's
      a race here if two or more harts use the same shadow stack at the same
      time.
      
      To solve this race, we introduce spin_shadow_stack atomic var, which
      will be swap between its own address and 0 in atomic way, when the
      var is set, it means the shadow_stack is being used; when the var
      is cleared, it means the shadow_stack isn't being used.
      
      Fixes: 31da94c2
      
       ("riscv: add VMAP_STACK overflow detection")
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Suggested-by: default avatarGuo Ren <guoren@kernel.org>
      Reviewed-by: default avatarGuo Ren <guoren@kernel.org>
      Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org
      [Palmer: Add AQ to the swap, and also some comments.]
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ac00301a
    • Alexandre Ghiti's avatar
      riscv: Sync efi page table's kernel mappings before switching · fa7a7d18
      Alexandre Ghiti authored
      [ Upstream commit 3f105a74
      
       ]
      
      The EFI page table is initially created as a copy of the kernel page table.
      With VMAP_STACK enabled, kernel stacks are allocated in the vmalloc area:
      if the stack is allocated in a new PGD (one that was not present at the
      moment of the efi page table creation or not synced in a previous vmalloc
      fault), the kernel will take a trap when switching to the efi page table
      when the vmalloc kernel stack is accessed, resulting in a kernel panic.
      
      Fix that by updating the efi kernel mappings before switching to the efi
      page table.
      
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Fixes: b91540d5
      
       ("RISC-V: Add EFI runtime services")
      Tested-by: default avatarEmil Renner Berthing <emil.renner.berthing@canonical.com>
      Reviewed-by: default avatarAtish Patra <atishp@rivosinc.com>
      Link: https://lore.kernel.org/r/20221121133303.1782246-1-alexghiti@rivosinc.com
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fa7a7d18
    • Maxim Korotkov's avatar
      pinctrl: single: Fix potential division by zero · d86d6989
      Maxim Korotkov authored
      [ Upstream commit 64c15033 ]
      
      There is a possibility of dividing by zero due to the pcs->bits_per_pin
      if pcs->fmask() also has a value of zero and called fls
      from asm-generic/bitops/builtin-fls.h or arch/x86/include/asm/bitops.h.
      The function pcs_probe() has the branch that assigned to fmask 0 before
      pcs_allocate_pin_table() was called
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 4e7e8017
      
       ("pinctrl: pinctrl-single: enhance to configure multiple pins of different modules")
      Signed-off-by: default avatarMaxim Korotkov <korotkov.maxim.s@gmail.com>
      Reviewed-by: default avatarTony Lindgren <tony@atomide.com>
      Link: https://lore.kernel.org/r/20221117123034.27383-1-korotkov.maxim.s@gmail.com
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d86d6989
    • Mark Brown's avatar
      ASoC: ops: Fix bounds check for _sx controls · 98b15c70
      Mark Brown authored
      [ Upstream commit 698813ba ]
      
      For _sx controls the semantics of the max field is not the usual one, max
      is the number of steps rather than the maximum value. This means that our
      check in snd_soc_put_volsw_sx() needs to just check against the maximum
      value.
      
      Fixes: 4f1e50d6
      
       ("ASoC: ops: Reject out of bounds values in snd_soc_put_volsw_sx()")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20220511134137.169575-1-broonie@kernel.org
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      98b15c70
    • Kazuki Takiguchi's avatar
      KVM: x86/mmu: Fix race condition in direct_page_fault · f88a6977
      Kazuki Takiguchi authored
      commit 47b0c2e4 upstream.
      
      make_mmu_pages_available() must be called with mmu_lock held for write.
      However, if the TDP MMU is used, it will be called with mmu_lock held for
      read.
      This function does nothing unless shadow pages are used, so there is no
      race unless nested TDP is used.
      Since nested TDP uses shadow pages, old shadow pages may be zapped by this
      function even when the TDP MMU is enabled.
      Since shadow pages are never allocated by kvm_tdp_mmu_map(), a race
      condition can be avoided by not calling make_mmu_pages_available() if the
      TDP MMU is currently in use.
      
      I encountered this when repeatedly starting and stopping nested VM.
      It can be artificially caused by allocating a large number of nested TDP
      SPTEs.
      
      For example, the following BUG and general protection fault are caused in
      the host kernel.
      
      pte_list_remove: 00000000cd54fc10 many->many
      ------------[ cut here ]------------
      kernel BUG at arch/x86/kvm/mmu/mmu.c:963!
      invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      RIP: 0010:pte_list_remove.cold+0x16/0x48 [kvm]
      Call Trace:
       <TASK>
       drop_spte+0xe0/0x180 [kvm]
       mmu_page_zap_pte+0x4f/0x140 [kvm]
       __kvm_mmu_prepare_zap_page+0x62/0x3e0 [kvm]
       kvm_mmu_zap_oldest_mmu_pages+0x7d/0xf0 [kvm]
       direct_page_fault+0x3cb/0x9b0 [kvm]
       kvm_tdp_page_fault+0x2c/0xa0 [kvm]
       kvm_mmu_page_fault+0x207/0x930 [kvm]
       npf_interception+0x47/0xb0 [kvm_amd]
       svm_invoke_exit_handler+0x13c/0x1a0 [kvm_amd]
       svm_handle_exit+0xfc/0x2c0 [kvm_amd]
       kvm_arch_vcpu_ioctl_run+0xa79/0x1780 [kvm]
       kvm_vcpu_ioctl+0x29b/0x6f0 [kvm]
       __x64_sys_ioctl+0x95/0xd0
       do_syscall_64+0x5c/0x90
      
      general protection fault, probably for non-canonical address
      0xdead000000000122: 0000 [#1] PREEMPT SMP NOPTI
      RIP: 0010:kvm_mmu_commit_zap_page.part.0+0x4b/0xe0 [kvm]
      Call Trace:
       <TASK>
       kvm_mmu_zap_oldest_mmu_pages+0xae/0xf0 [kvm]
       direct_page_fault+0x3cb/0x9b0 [kvm]
       kvm_tdp_page_fault+0x2c/0xa0 [kvm]
       kvm_mmu_page_fault+0x207/0x930 [kvm]
       npf_interception+0x47/0xb0 [kvm_amd]
      
      CVE: CVE-2022-45869
      Fixes: a2855afc
      
       ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
      Signed-off-by: default avatarKazuki Takiguchi <takiguchi.kazuki171@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f88a6977
    • Lin Ma's avatar
      io_uring/poll: fix poll_refs race with cancelation · df4b177b
      Lin Ma authored
      [ upstream commit 12ad3d2d ]
      
      There is an interesting race condition of poll_refs which could result
      in a NULL pointer dereference. The crash trace is like:
      
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      CPU: 0 PID: 30781 Comm: syz-executor.2 Not tainted 6.0.0-g493ffd6605b2 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.13.0-1ubuntu1.1 04/01/2014
      RIP: 0010:io_poll_remove_entry io_uring/poll.c:154 [inline]
      RIP: 0010:io_poll_remove_entries+0x171/0x5b4 io_uring/poll.c:190
      Code: ...
      RSP: 0018:ffff88810dfefba0 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000040000
      RDX: ffffc900030c4000 RSI: 000000000003ffff RDI: 0000000000040000
      RBP: 0000000000000008 R08: ffffffff9764d3dd R09: fffffbfff3836781
      R10: fffffbfff3836781 R11: 0000000000000000 R12: 1ffff11003422d60
      R13: ffff88801a116b04 R14: ffff88801a116ac0 R15: dffffc0000000000
      FS:  00007f9c07497700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffb5c00ea98 CR3: 0000000105680005 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       io_apoll_task_func+0x3f/0xa0 io_uring/poll.c:299
       handle_tw_list io_uring/io_uring.c:1037 [inline]
       tctx_task_work+0x37e/0x4f0 io_uring/io_uring.c:1090
       task_work_run+0x13a/0x1b0 kernel/task_work.c:177
       get_signal+0x2402/0x25a0 kernel/signal.c:2635
       arch_do_signal_or_restart+0x3b/0x660 arch/x86/kernel/signal.c:869
       exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
       exit_to_user_mode_prepare+0xc2/0x160 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
       syscall_exit_to_user_mode+0x58/0x160 kernel/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The root cause for this is a tiny overlooking in
      io_poll_check_events() when cocurrently run with poll cancel routine
      io_poll_cancel_req().
      
      The interleaving to trigger use-after-free:
      
      CPU0                                       |  CPU1
                                                 |
      io_apoll_task_func()                       |  io_poll_cancel_req()
       io_poll_check_events()                    |
        // do while first loop                   |
        v = atomic_read(...)                     |
        // v = poll_refs = 1                     |
        ...                                      |  io_poll_mark_cancelled()
                                                 |   atomic_or()
                                                 |   // poll_refs =
      IO_POLL_CANCEL_FLAG | 1
                                                 |
        atomic_sub_return(...)                   |
        // poll_refs = IO_POLL_CANCEL_FLAG       |
        // loop continue                         |
                                                 |
                                                 |  io_poll_execute()
                                                 |   io_poll_get_ownership()
                                                 |   // poll_refs =
      IO_POLL_CANCEL_FLAG | 1
                                                 |   // gets the ownership
        v = atomic_read(...)                     |
        // poll_refs not change                  |
                                                 |
        if (v & IO_POLL_CANCEL_FLAG)             |
         return -ECANCELED;                      |
        // io_poll_check_events return           |
        // will go into                          |
        // io_req_complete_failed() free req     |
                                                 |
                                                 |  io_apoll_task_func()
                                                 |  // also go into
      io_req_complete_failed()
      
      And the interleaving to trigger the kernel WARNING:
      
      CPU0                                       |  CPU1
                                                 |
      io_apoll_task_func()                       |  io_poll_cancel_req()
       io_poll_check_events()                    |
        // do while first loop                   |
        v = atomic_read(...)                     |
        // v = poll_refs = 1                     |
        ...                                      |  io_poll_mark_cancelled()
                                                 |   atomic_or()
                                                 |   // poll_refs =
      IO_POLL_CANCEL_FLAG | 1
                                                 |
        atomic_sub_return(...)                   |
        // poll_refs = IO_POLL_CANCEL_FLAG       |
        // loop continue                         |
                                                 |
        v = atomic_read(...)                     |
        // v = IO_POLL_CANCEL_FLAG               |
                                                 |  io_poll_execute()
                                                 |   io_poll_get_ownership()
                                                 |   // poll_refs =
      IO_POLL_CANCEL_FLAG | 1
                                                 |   // gets the ownership
                                                 |
        WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))   |
        // v & IO_POLL_REF_MASK = 0 WARN         |
                                                 |
                                                 |  io_apoll_task_func()
                                                 |  // also go into
      io_req_complete_failed()
      
      By looking up the source code and communicating with Pavel, the
      implementation of this atomic poll refs should continue the loop of
      io_poll_check_events() just to avoid somewhere else to grab the
      ownership. Therefore, this patch simply adds another AND operation to
      make sure the loop will stop if it finds the poll_refs is exactly equal
      to IO_POLL_CANCEL_FLAG. Since io_poll_cancel_req() grabs ownership and
      will finally make its way to io_req_complete_failed(), the req will
      be reclaimed as expected.
      
      Fixes: aa43477b
      
       ("io_uring: poll rework")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      [axboe: tweak description and code style]
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df4b177b
    • Pavel Begunkov's avatar
      io_uring: make poll refs more robust · 4b702b7d
      Pavel Begunkov authored
      [ upstream commit a26a35e9
      
       ]
      
      poll_refs carry two functions, the first is ownership over the request.
      The second is notifying the io_poll_check_events() that there was an
      event but wake up couldn't grab the ownership, so io_poll_check_events()
      should retry.
      
      We want to make poll_refs more robust against overflows. Instead of
      always incrementing it, which covers two purposes with one atomic, check
      if poll_refs is elevated enough and if so set a retry flag without
      attempts to grab ownership. The gap between the bias check and following
      atomics may seem racy, but we don't need it to be strict. Moreover there
      might only be maximum 4 parallel updates: by the first and the second
      poll entries, __io_arm_poll_handler() and cancellation. From those four,
      only poll wake ups may be executed multiple times, but they're protected
      by a spin.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarLin Ma <linma@zju.edu.cn>
      Fixes: aa43477b
      
       ("io_uring: poll rework")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/c762bc31f8683b3270f3587691348a7119ef9c9d.1668963050.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b702b7d
    • Pavel Begunkov's avatar
      io_uring: cmpxchg for poll arm refs release · 1d58849a
      Pavel Begunkov authored
      [ upstream commit 2f389343 ]
      
      Replace atomically substracting the ownership reference at the end of
      arming a poll with a cmpxchg. We try to release ownership by setting 0
      assuming that poll_refs didn't change while we were arming. If it did
      change, we keep the ownership and use it to queue a tw, which is fully
      capable to process all events and (even tolerates spurious wake ups).
      
      It's a bit more elegant as we reduce races b/w setting the cancellation
      flag and getting refs with this release, and with that we don't have to
      worry about any kinds of underflows. It's not the fastest path for
      polling. The performance difference b/w cmpxchg and atomic dec is
      usually negligible and it's not the fastest path.
      
      Cc: stable@vger.kernel.org
      Fixes: aa43477b
      
       ("io_uring: poll rework")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/0c95251624397ea6def568ff040cad2d7926fd51.1668963050.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d58849a
    • Pavel Begunkov's avatar
      io_uring: fix tw losing poll events · cd1981a8
      Pavel Begunkov authored
      [ upstream commit 539bcb57 ]
      
      We may never try to process a poll wake and its mask if there was
      multiple wake ups racing for queueing up a tw. Force
      io_poll_check_events() to update the mask by vfs_poll().
      
      Cc: stable@vger.kernel.org
      Fixes: aa43477b
      
       ("io_uring: poll rework")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd1981a8
    • Pavel Begunkov's avatar
      io_uring: update res mask in io_poll_check_events · 62321dc7
      Pavel Begunkov authored
      [ upstream commit b98186ae ]
      
      When io_poll_check_events() collides with someone attempting to queue a
      task work, it'll spin for one more time. However, it'll continue to use
      the mask from the first iteration instead of updating it. For example,
      if the first wake up was a EPOLLIN and the second EPOLLOUT, the
      userspace will not get EPOLLOUT in time.
      
      Clear the mask for all subsequent iterations to force vfs_poll().
      
      Cc: stable@vger.kernel.org
      Fixes: aa43477b
      
       ("io_uring: poll rework")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62321dc7
    • Steven Rostedt (Google)'s avatar
      tracing: Free buffers when a used dynamic event is removed · 417d5ea6
      Steven Rostedt (Google) authored
      commit 4313e5a6 upstream.
      
      After 65536 dynamic events have been added and removed, the "type" field
      of the event then uses the first type number that is available (not
      currently used by other events). A type number is the identifier of the
      binary blobs in the tracing ring buffer (known as events) to map them to
      logic that can parse the binary blob.
      
      The issue is that if a dynamic event (like a kprobe event) is traced and
      is in the ring buffer, and then that event is removed (because it is
      dynamic, which means it can be created and destroyed), if another dynamic
      event is created that has the same number that new event's logic on
      parsing the binary blob will be used.
      
      To show how this can be an issue, the following can crash the kernel:
      
       # cd /sys/kernel/tracing
       # for i in `seq 65536`; do
           echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
       # done
      
      For every iteration of the above, the writing to the kprobe_events will
      remove the old event and create a new one (with the same format) and
      increase the type number to the next available on until the type number
      reaches over 65535 which is the max number for the 16 bit type. After it
      reaches that number, the logic to allocate a new number simply looks for
      the next available number. When an dynamic event is removed, that number
      is then available to be reused by the next dynamic event created. That is,
      once the above reaches the max number, the number assigned to the event in
      that loop will remain the same.
      
      Now that means deleting one dynamic event and created another will reuse
      the previous events type number. This is where bad things can happen.
      After the above loop finishes, the kprobes/foo event which reads the
      do_sys_openat2 function call's first parameter as an integer.
      
       # echo 1 > kprobes/foo/enable
       # cat /etc/passwd > /dev/null
       # cat trace
                   cat-2211    [005] ....  2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
       # echo 0 > kprobes/foo/enable
      
      Now if we delete the kprobe and create a new one that reads a string:
      
       # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events
      
      And now we can the trace:
      
       # cat trace
              sendmail-1942    [002] .....   530.136320: foo: (do_sys_openat2+0x0/0x240) arg1=             cat-2046    [004] .....   530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                  bash-1515    [007] .....   534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U
      
      And dmesg has:
      
      ==================================================================
      BUG: KASAN: use-after-free in string+0xd4/0x1c0
      Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049
      
       CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       Call Trace:
        <TASK>
        dump_stack_lvl+0x5b/0x77
        print_report+0x17f/0x47b
        kasan_report+0xad/0x130
        string+0xd4/0x1c0
        vsnprintf+0x500/0x840
        seq_buf_vprintf+0x62/0xc0
        trace_seq_printf+0x10e/0x1e0
        print_type_string+0x90/0xa0
        print_kprobe_event+0x16b/0x290
        print_trace_line+0x451/0x8e0
        s_show+0x72/0x1f0
        seq_read_iter+0x58e/0x750
        seq_read+0x115/0x160
        vfs_read+0x11d/0x460
        ksys_read+0xa9/0x130
        do_syscall_64+0x3a/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
       RIP: 0033:0x7fc2e972ade2
       Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
       RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
       RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
       RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
       RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
       R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
       R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
        </TASK>
      
       The buggy address belongs to the physical page:
       page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
       flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
       raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
       raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                      ^
        ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ==================================================================
      
      This was found when Zheng Yejian sent a patch to convert the event type
      number assignment to use IDA, which gives the next available number, and
      this bug showed up in the fuzz testing by Yujie Liu and the kernel test
      robot. But after further analysis, I found that this behavior is the same
      as when the event type numbers go past the 16bit max (and the above shows
      that).
      
      As modules have a similar issue, but is dealt with by setting a
      "WAS_ENABLED" flag when a module event is enabled, and when the module is
      freed, if any of its events were enabled, the ring buffer that holds that
      event is also cleared, to prevent reading stale events. The same can be
      done for dynamic events.
      
      If any dynamic event that is being removed was enabled, then make sure the
      buffers they were enabled in are now cleared.
      
      Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home
      Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/
      
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Depends-on: e18eb878 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
      Depends-on: 5448d44c ("tracing: Add unified dynamic event framework")
      Depends-on: 6212dd29 ("tracing/kprobes: Use dyn_event framework for kprobe events")
      Depends-on: 065e63f9 ("tracing: Only have rmmod clear buffers that its events were active in")
      Depends-on: 575380da ("tracing: Only clear trace buffer on module unload if event was traced")
      Fixes: 77b44d1b
      
       ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
      Reported-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Reported-by: default avatarYujie Liu <yujie.liu@intel.com>
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      417d5ea6
    • Steven Rostedt (Google)'s avatar
      tracing: Fix race where histograms can be called before the event · 52fc245d
      Steven Rostedt (Google) authored
      commit ef38c79a upstream.
      
      commit 94eedf3d ("tracing: Fix race where eprobes can be called before
      the event") fixed an issue where if an event is soft disabled, and the
      trigger is being added, there's a small window where the event sees that
      there's a trigger but does not see that it requires reading the event yet,
      and then calls the trigger with the record == NULL.
      
      This could be solved with adding memory barriers in the hot path, or to
      make sure that all the triggers requiring a record check for NULL. The
      latter was chosen.
      
      Commit 94eedf3d set the eprobe trigger handle to check for NULL, but
      the same needs to be done with histograms.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/
      Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 7491e2c4
      
       ("tracing: Add a probe that attaches to trace events")
      Reported-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52fc245d
    • Daniel Bristot de Oliveira's avatar
      tracing/osnoise: Fix duration type · cb2b0612
      Daniel Bristot de Oliveira authored
      commit 022632f6 upstream.
      
      The duration type is a 64 long value, not an int. This was
      causing some long noise to report wrong values.
      
      Change the duration to a 64 bits value.
      
      Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org
      
      Cc: stable@vger.kernel.org
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Fixes: bce29ac9
      
       ("trace: Add osnoise tracer")
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb2b0612
    • Janusz Krzysztofik's avatar
      drm/i915: Never return 0 if not all requests retired · 615a996f
      Janusz Krzysztofik authored
      commit 12b8b046 upstream.
      
      Users of intel_gt_retire_requests_timeout() expect 0 return value on
      success.  However, we have no protection from passing back 0 potentially
      returned by a call to dma_fence_wait_timeout() when it succedes right
      after its timeout has expired.
      
      Replace 0 with -ETIME before potentially using the timeout value as return
      code, so -ETIME is returned if there are still some requests not retired
      after timeout, 0 otherwise.
      
      v3: Use conditional expression, more compact but also better reflecting
          intention standing behind the change.
      
      v2: Move the added lines down so flush_submission() is not affected.
      
      Fixes: f33a8a51
      
       ("drm/i915: Merge wait_for_timelines with retire_request")
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Reviewed-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Cc: stable@vger.kernel.org # v5.5+
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221121145655.75141-3-janusz.krzysztofik@linux.intel.com
      (cherry picked from commit f301a29f
      
      )
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      615a996f
    • Janusz Krzysztofik's avatar
      drm/i915: Fix negative value passed as remaining time · 01a2b25e
      Janusz Krzysztofik authored
      commit a8899b87 upstream.
      
      Commit b97060a9 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
      with GuC") extended the API of intel_gt_retire_requests_timeout() with an
      extra argument 'remaining_timeout', intended for passing back unconsumed
      portion of requested timeout when 0 (success) is returned.  However, when
      request retirement happens to succeed despite an error returned by a call
      to dma_fence_wait_timeout(), that error code (a negative value) is passed
      back instead of remaining time.  If we then pass that negative value
      forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG
      will be triggered.
      
      If request retirement succeeds but an error code is passed back via
      remaininig_timeout, we may have no clue on how much of the initial timeout
      might have been left for spending it on waiting for GuC to become idle.
      OTOH, since all pending requests have been successfully retired, that
      error code has been already ignored by intel_gt_retire_requests_timeout(),
      then we shouldn't fail.
      
      Assume no more time has been left on error and pass 0 timeout value to
      intel_uc_wait_for_idle() to give it a chance to return success if GuC is
      already idle.
      
      v3: Don't fail on any error passed back via remaining_timeout.
      
      v2: Fix the issue on the caller side, not the provider.
      
      Fixes: b97060a9
      
       ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: stable@vger.kernel.org # v5.15+
      Reviewed-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221121145655.75141-2-janusz.krzysztofik@linux.intel.com
      (cherry picked from commit f235dbd5
      
      )
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01a2b25e
    • Leo Liu's avatar
      drm/amdgpu: enable Vangogh VCN indirect sram mode · ff1591ba
      Leo Liu authored
      commit 9a8cc8ca upstream.
      
      So that uses PSP to initialize HW.
      
      Fixes: 0c2c02b6
      
       ("drm/amdgpu/vcn: add firmware support for dimgrey_cavefish")
      Signed-off-by: default avatarLeo Liu <leo.liu@amd.com>
      Reviewed-by: default avatarJames Zhu <James.Zhu@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff1591ba
    • Lee Jones's avatar
      drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame · ac2d7fa9
      Lee Jones authored
      commit 6f6cb171 upstream.
      
      Patch series "Fix a bunch of allmodconfig errors", v2.
      
      Since b339ec9c
      
       ("kbuild: Only default to -Werror if COMPILE_TEST")
      WERROR now defaults to COMPILE_TEST meaning that it's enabled for
      allmodconfig builds.  This leads to some interesting build failures when
      using Clang, each resolved in this set.
      
      With this set applied, I am able to obtain a successful allmodconfig Arm
      build.
      
      
      This patch (of 2):
      
      calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 ||
      ARM64) architectures built with Clang (all released versions), whereby the
      stack frame gets blown up to well over 5k.  This would cause an immediate
      kernel panic on most architectures.  We'll revert this when the following
      bug report has been resolved:
      https://github.com/llvm/llvm-project/issues/41896.
      
      Link: https://lkml.kernel.org/r/20221125120750.3537134-1-lee@kernel.org
      Link: https://lkml.kernel.org/r/20221125120750.3537134-2-lee@kernel.org
      Signed-off-by: default avatarLee Jones <lee@kernel.org>
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "Christian König" <christian.koenig@amd.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@gmail.com>
      Cc: Harry Wentland <harry.wentland@amd.com>
      Cc: Lee Jones <lee@kernel.org>
      Cc: Leo Li <sunpeng.li@amd.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
      Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Tom Rix <trix@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac2d7fa9
    • Adrian Hunter's avatar
      mmc: sdhci: Fix voltage switch delay · 57ee7bc4
      Adrian Hunter authored
      commit c981cdfb upstream.
      
      Commit 20b92a30 ("mmc: sdhci: update signal voltage switch code")
      removed voltage switch delays from sdhci because mmc core had been
      enhanced to support them. However that assumed that sdhci_set_ios()
      did a single clock change, which it did not, and so the delays in mmc
      core, which should have come after the first clock change, were not
      effective.
      
      Fix by avoiding re-configuring UHS and preset settings when the clock
      is turning on and the settings have not changed. That then also avoids
      the associated clock changes, so that then sdhci_set_ios() does a single
      clock change when voltage switching, and the mmc core delays become
      effective.
      
      To do that has meant keeping track of driver strength (host->drv_type),
      and cases of reinitialization (host->reinit_uhs).
      
      Note also, the 'turning_on_clk' restriction should not be necessary
      but is done to minimize the impact of the change on stable kernels.
      
      Fixes: 20b92a30
      
       ("mmc: sdhci: update signal voltage switch code")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Link: https://lore.kernel.org/r/20221128133259.38305-2-adrian.hunter@intel.com
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57ee7bc4
    • Wenchao Chen's avatar
      mmc: sdhci-sprd: Fix no reset data and command after voltage switch · bb8f8095
      Wenchao Chen authored
      commit dd30dcfa upstream.
      
      After switching the voltage, no reset data and command will cause
      CMD2 timeout.
      
      Fixes: 29ca763f
      
       ("mmc: sdhci-sprd: Add pin control support for voltage switch")
      Signed-off-by: default avatarWenchao Chen <wenchao.chen@unisoc.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20221130121328.25553-1-wenchao.chen@unisoc.com
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb8f8095
    • Sebastian Falbesoner's avatar
      mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check · 4c7681c1
      Sebastian Falbesoner authored
      commit a3cab1d2 upstream.
      
      With the current logic the "failed to exit halt state" error would be
      shown even if any other bit than CQHCI_HALT was set in the CQHCI_CTL
      register, since the right hand side is always true. Fix this by using
      the correct operator (bit-wise instead of logical AND) to only check for
      the halt bit flag, which was obviously intended here.
      
      Fixes: 85236d2b
      
       ("mmc: sdhci-esdhc-imx: clear the HALT bit when enable CQE")
      Signed-off-by: default avatarSebastian Falbesoner <sebastian.falbesoner@gmail.com>
      Acked-by: default avatarHaibo Chen <haibo.chen@nxp.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20221121105721.1903878-1-sebastian.falbesoner@gmail.com
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c7681c1
    • Christian Löhle's avatar
      mmc: core: Fix ambiguous TRIM and DISCARD arg · 01dbe4db
      Christian Löhle authored
      commit 489d1445 upstream.
      
      Clean up the MMC_TRIM_ARGS define that became ambiguous with DISCARD
      introduction.  While at it, let's fix one usage where MMC_TRIM_ARGS falsely
      included DISCARD too.
      
      Fixes: b3bf9153
      
       ("mmc: core: new discard feature support at eMMC v4.5")
      Signed-off-by: default avatarChristian Loehle <cloehle@hyperstone.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/11376b5714964345908f3990f17e0701@hyperstone.com
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01dbe4db
    • Ye Bin's avatar
      mmc: mmc_test: Fix removal of debugfs file · 738946e3
      Ye Bin authored
      commit f4307b4d upstream.
      
      In __mmc_test_register_dbgfs_file(), we need to assign 'file', as it's
      being used when removing the debugfs files when the mmc_test module is
      removed.
      
      Fixes: a04c50aa
      
       ("mmc: core: no need to check return value of debugfs_create functions")
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: stable@vger.kernel.org
      [Ulf: Re-wrote the commit msg]
      Link: https://lore.kernel.org/r/20221123095506.1965691-1-yebin@huaweicloud.com
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      738946e3
    • Goh, Wei Sheng's avatar
      net: stmmac: Set MAC's flow control register to reflect current settings · 635d0517
      Goh, Wei Sheng authored
      commit cc3d2b5f upstream.
      
      Currently, pause frame register GMAC_RX_FLOW_CTRL_RFE is not updated
      correctly when 'ethtool -A <IFACE> autoneg off rx off tx off' command
      is issued. This fix ensures the flow control change is reflected directly
      in the GMAC_RX_FLOW_CTRL_RFE register.
      
      Fixes: 46f69ded
      
       ("net: stmmac: Use resolved link config in mac_link_up()")
      Cc: <stable@vger.kernel.org> # 5.10.x
      Signed-off-by: default avatarGoh, Wei Sheng <wei.sheng.goh@intel.com>
      Signed-off-by: default avatarNoor Azura Ahmad Tarmizi <noor.azura.ahmad.tarmizi@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      635d0517
    • Linus Torvalds's avatar
      v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails · 9132dcdf
      Linus Torvalds authored
      commit 6647e76a
      
       upstream.
      
      The V4L2_MEMORY_USERPTR interface is long deprecated and shouldn't be
      used (and is discouraged for any modern v4l drivers).  And Seth Jenkins
      points out that the fallback to VM_PFNMAP/VM_IO is fundamentally racy
      and dangerous.
      
      Note that it's not even a case that should trigger, since any normal
      user pointer logic ends up just using the pin_user_pages_fast() call
      that does the proper page reference counting.  That's not the problem
      case, only if you try to use special device mappings do you have any
      issues.
      
      Normally I'd just remove this during the merge window, but since Seth
      pointed out the problem cases, we really want to know as soon as
      possible if there are actually any users of this odd special case of a
      legacy interface.  Neither Hans nor Mauro seem to think that such
      mis-uses of the old legacy interface should exist.  As Mauro says:
      
       "See, V4L2 has actually 4 streaming APIs:
              - Kernel-allocated mmap (usually referred simply as just mmap);
              - USERPTR mmap;
              - read();
              - dmabuf;
      
        The USERPTR is one of the oldest way to use it, coming from V4L
        version 1 times, and by far the least used one"
      
      And Hans chimed in on the USERPTR interface:
      
       "To be honest, I wouldn't mind if it goes away completely, but that's a
        bit of a pipe dream right now"
      
      but while removing this legacy interface entirely may be a pipe dream we
      can at least try to remove the unlikely (and actively broken) case of
      using special device mappings for USERPTR accesses.
      
      This replaces it with a WARN_ONCE() that we can remove once we've
      hopefully confirmed that no actual users exist.
      
      NOTE! Longer term, this means that a 'struct frame_vector' only ever
      contains proper page pointers, and all the games we have with converting
      them to pages can go away (grep for 'frame_vector_to_pages()' and the
      uses of 'vec->is_pfns').  But this is just the first step, to verify
      that this code really is all dead, and do so as quickly as possible.
      
      Reported-by: default avatarSeth Jenkins <sethjenkins@google.com>
      Acked-by: default avatarHans Verkuil <hverkuil@xs4all.nl>
      Acked-by: default avatarMauro Carvalho Chehab <mchehab@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9132dcdf
    • Andy Shevchenko's avatar
      pinctrl: intel: Save and restore pins in "direct IRQ" mode · 76ad884b
      Andy Shevchenko authored
      commit 6989ea48
      
       upstream.
      
      The firmware on some systems may configure GPIO pins to be
      an interrupt source in so called "direct IRQ" mode. In such
      cases the GPIO controller driver has no idea if those pins
      are being used or not. At the same time, there is a known bug
      in the firmwares that don't restore the pin settings correctly
      after suspend, i.e. by an unknown reason the Rx value becomes
      inverted.
      
      Hence, let's save and restore the pins that are configured
      as GPIOs in the input mode with GPIROUTIOXAPIC bit set.
      
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: default avatarDale Smith <dalepsmith@gmail.com>
      Reported-and-tested-by: default avatarJohn Harris <jmharris@gmail.com>
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=214749
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Link: https://lore.kernel.org/r/20221124222926.72326-1-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76ad884b
    • Pawan Gupta's avatar
      x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3 · 41296b85
      Pawan Gupta authored
      commit 66065157 upstream.
      
      The "force" argument to write_spec_ctrl_current() is currently ambiguous
      as it does not guarantee the MSR write. This is due to the optimization
      that writes to the MSR happen only when the new value differs from the
      cached value.
      
      This is fine in most cases, but breaks for S3 resume when the cached MSR
      value gets out of sync with the hardware MSR value due to S3 resetting
      it.
      
      When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write
      is skipped. Which results in SPEC_CTRL mitigations not getting restored.
      
      Move the MSR write from write_spec_ctrl_current() to a new function that
      unconditionally writes to the MSR. Update the callers accordingly and
      rename functions.
      
        [ bp: Rework a bit. ]
      
      Fixes: caa0ff24
      
       ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
      Suggested-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@kernel.org>
      Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41296b85
    • ZhangPeng's avatar
      nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() · 33021419
      ZhangPeng authored
      commit f0a0ccda
      
       upstream.
      
      Syzbot reported a null-ptr-deref bug:
      
       NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
       frequency < 30 seconds
       general protection fault, probably for non-canonical address
       0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
       KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
       CPU: 1 PID: 3603 Comm: segctord Not tainted
       6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0
       Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google
       10/11/2022
       RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0
       fs/nilfs2/alloc.c:608
       Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00
       00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02
       00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7
       RSP: 0018:ffffc90003dff830 EFLAGS: 00010212
       RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d
       RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010
       RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f
       R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158
       R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004
       FS:  0000000000000000(0000) GS:ffff8880b9b00000(0000)
       knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0
       Call Trace:
        <TASK>
        nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline]
        nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193
        nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236
        nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940
        nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline]
        nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline]
        nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088
        nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
        nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568
        nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018
        nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067
        nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline]
        nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline]
        nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045
        nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379
        nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline]
        nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570
        kthread+0x2e4/0x3a0 kernel/kthread.c:376
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
        </TASK>
       ...
      
      If DAT metadata file is corrupted on disk, there is a case where
      req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during
      a b-tree operation that cascadingly updates ancestor nodes of the b-tree,
      because nilfs_dat_commit_alloc() for a lower level block can initialize
      the blocknr on the same DAT entry between nilfs_dat_prepare_end() and
      nilfs_dat_commit_end().
      
      If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free()
      without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and
      causes the NULL pointer dereference above in
      nilfs_palloc_commit_free_entry() function, which leads to a crash.
      
      Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh
      before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free().
      
      This also calls nilfs_error() in that case to notify that there is a fatal
      flaw in the filesystem metadata and prevent further operations.
      
      Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com
      Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com
      Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarZhangPeng <zhangpeng362@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar <syzbot+ebe05ee8e98f755f61d0@syzkaller.appspotmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33021419
    • Tiezhu Yang's avatar
      tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" · 2e44dd9a
      Tiezhu Yang authored
      commit a435874b
      
       upstream.
      
      The latest version of grep claims the egrep is now obsolete so the build
      now contains warnings that look like:
      
      	egrep: warning: egrep is obsolescent; using grep -E
      
      fix this up by moving the related file to use "grep -E" instead.
      
        sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/vm`
      
      Here are the steps to install the latest grep:
      
        wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
        tar xf grep-3.8.tar.gz
        cd grep-3.8 && ./configure && make
        sudo make install
        export PATH=/usr/local/bin:$PATH
      
      Link: https://lkml.kernel.org/r/1668825419-30584-1-git-send-email-yangtiezhu@loongson.cn
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e44dd9a