Skip to content
  1. Aug 23, 2023
    • Lin Ma's avatar
      xfrm: add NULL check in xfrm_update_ae_params · 87b655f4
      Lin Ma authored
      [ Upstream commit 00374d9b ]
      
      Normally, x->replay_esn and x->preplay_esn should be allocated at
      xfrm_alloc_replay_state_esn(...) in xfrm_state_construct(...), hence the
      xfrm_update_ae_params(...) is okay to update them. However, the current
      implementation of xfrm_new_ae(...) allows a malicious user to directly
      dereference a NULL pointer and crash the kernel like below.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 8253067 P4D 8253067 PUD 8e0e067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
      CPU: 0 PID: 98 Comm: poc.npd Not tainted 6.4.0-rc7-00072-gdad9774deaf1 #8
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.o4
      RIP: 0010:memcpy_orig+0xad/0x140
      Code: e8 4c 89 5f e0 48 8d 7f e0 73 d2 83 c2 20 48 29 d6 48 29 d7 83 fa 10 72 34 4c 8b 06 4c 8b 4e 08 c
      RSP: 0018:ffff888008f57658 EFLAGS: 00000202
      RAX: 0000000000000000 RBX: ffff888008bd0000 RCX: ffffffff8238e571
      RDX: 0000000000000018 RSI: ffff888007f64844 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff888008f57818
      R13: ffff888007f64aa4 R14: 0000000000000000 R15: 0000000000000000
      FS:  00000000014013c0(0000) GS:ffff88806d600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000054d8000 CR4: 00000000000006f0
      Call Trace:
       <TASK>
       ? __die+0x1f/0x70
       ? page_fault_oops+0x1e8/0x500
       ? __pfx_is_prefetch.constprop.0+0x10/0x10
       ? __pfx_page_fault_oops+0x10/0x10
       ? _raw_spin_unlock_irqrestore+0x11/0x40
       ? fixup_exception+0x36/0x460
       ? _raw_spin_unlock_irqrestore+0x11/0x40
       ? exc_page_fault+0x5e/0xc0
       ? asm_exc_page_fault+0x26/0x30
       ? xfrm_update_ae_params+0xd1/0x260
       ? memcpy_orig+0xad/0x140
       ? __pfx__raw_spin_lock_bh+0x10/0x10
       xfrm_update_ae_params+0xe7/0x260
       xfrm_new_ae+0x298/0x4e0
       ? __pfx_xfrm_new_ae+0x10/0x10
       ? __pfx_xfrm_new_ae+0x10/0x10
       xfrm_user_rcv_msg+0x25a/0x410
       ? __pfx_xfrm_user_rcv_msg+0x10/0x10
       ? __alloc_skb+0xcf/0x210
       ? stack_trace_save+0x90/0xd0
       ? filter_irq_stacks+0x1c/0x70
       ? __stack_depot_save+0x39/0x4e0
       ? __kasan_slab_free+0x10a/0x190
       ? kmem_cache_free+0x9c/0x340
       ? netlink_recvmsg+0x23c/0x660
       ? sock_recvmsg+0xeb/0xf0
       ? __sys_recvfrom+0x13c/0x1f0
       ? __x64_sys_recvfrom+0x71/0x90
       ? do_syscall_64+0x3f/0x90
       ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
       ? copyout+0x3e/0x50
       netlink_rcv_skb+0xd6/0x210
       ? __pfx_xfrm_user_rcv_msg+0x10/0x10
       ? __pfx_netlink_rcv_skb+0x10/0x10
       ? __pfx_sock_has_perm+0x10/0x10
       ? mutex_lock+0x8d/0xe0
       ? __pfx_mutex_lock+0x10/0x10
       xfrm_netlink_rcv+0x44/0x50
       netlink_unicast+0x36f/0x4c0
       ? __pfx_netlink_unicast+0x10/0x10
       ? netlink_recvmsg+0x500/0x660
       netlink_sendmsg+0x3b7/0x700
      
      This Null-ptr-deref bug is assigned CVE-2023-3772. And this commit
      adds additional NULL check in xfrm_update_ae_params to fix the NPD.
      
      Fixes: d8647b79
      
       ("xfrm: Add user interface for esn and big anti-replay windows")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      87b655f4
    • Zhengchao Shao's avatar
      ip_vti: fix potential slab-use-after-free in decode_session6 · 2b05bf5d
      Zhengchao Shao authored
      [ Upstream commit 6018a266 ]
      
      When ip_vti device is set to the qdisc of the sfb type, the cb field
      of the sent skb may be modified during enqueuing. Then,
      slab-use-after-free may occur when ip_vti device sends IPv6 packets.
      As commit f8556919 ("xfrm6: Fix the nexthdr offset in
      _decode_session6.") showed, xfrm_decode_session was originally intended
      only for the receive path. IP6CB(skb)->nhoff is not set during
      transmission. Therefore, set the cb field in the skb to 0 before
      sending packets.
      
      Fixes: f8556919
      
       ("xfrm6: Fix the nexthdr offset in _decode_session6.")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2b05bf5d
    • Zhengchao Shao's avatar
      ip6_vti: fix slab-use-after-free in decode_session6 · 55ad2309
      Zhengchao Shao authored
      [ Upstream commit 9fd41f1b ]
      
      When ipv6_vti device is set to the qdisc of the sfb type, the cb field
      of the sent skb may be modified during enqueuing. Then,
      slab-use-after-free may occur when ipv6_vti device sends IPv6 packets.
      
      The stack information is as follows:
      BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
      Read of size 1 at addr ffff88802e08edc2 by task swapper/0/0
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-next-20230707-00001-g84e2cad7f979 #410
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
      Call Trace:
      <IRQ>
      dump_stack_lvl+0xd9/0x150
      print_address_description.constprop.0+0x2c/0x3c0
      kasan_report+0x11d/0x130
      decode_session6+0x103f/0x1890
      __xfrm_decode_session+0x54/0xb0
      vti6_tnl_xmit+0x3e6/0x1ee0
      dev_hard_start_xmit+0x187/0x700
      sch_direct_xmit+0x1a3/0xc30
      __qdisc_run+0x510/0x17a0
      __dev_queue_xmit+0x2215/0x3b10
      neigh_connected_output+0x3c2/0x550
      ip6_finish_output2+0x55a/0x1550
      ip6_finish_output+0x6b9/0x1270
      ip6_output+0x1f1/0x540
      ndisc_send_skb+0xa63/0x1890
      ndisc_send_rs+0x132/0x6f0
      addrconf_rs_timer+0x3f1/0x870
      call_timer_fn+0x1a0/0x580
      expire_timers+0x29b/0x4b0
      run_timer_softirq+0x326/0x910
      __do_softirq+0x1d4/0x905
      irq_exit_rcu+0xb7/0x120
      sysvec_apic_timer_interrupt+0x97/0xc0
      </IRQ>
      Allocated by task 9176:
      kasan_save_stack+0x22/0x40
      kasan_set_track+0x25/0x30
      __kasan_slab_alloc+0x7f/0x90
      kmem_cache_alloc_node+0x1cd/0x410
      kmalloc_reserve+0x165/0x270
      __alloc_skb+0x129/0x330
      netlink_sendmsg+0x9b1/0xe30
      sock_sendmsg+0xde/0x190
      ____sys_sendmsg+0x739/0x920
      ___sys_sendmsg+0x110/0x1b0
      __sys_sendmsg+0xf7/0x1c0
      do_syscall_64+0x39/0xb0
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      Freed by task 9176:
      kasan_save_stack+0x22/0x40
      kasan_set_track+0x25/0x30
      kasan_save_free_info+0x2b/0x40
      ____kasan_slab_free+0x160/0x1c0
      slab_free_freelist_hook+0x11b/0x220
      kmem_cache_free+0xf0/0x490
      skb_free_head+0x17f/0x1b0
      skb_release_data+0x59c/0x850
      consume_skb+0xd2/0x170
      netlink_unicast+0x54f/0x7f0
      netlink_sendmsg+0x926/0xe30
      sock_sendmsg+0xde/0x190
      ____sys_sendmsg+0x739/0x920
      ___sys_sendmsg+0x110/0x1b0
      __sys_sendmsg+0xf7/0x1c0
      do_syscall_64+0x39/0xb0
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      The buggy address belongs to the object at ffff88802e08ed00
      which belongs to the cache skbuff_small_head of size 640
      The buggy address is located 194 bytes inside of
      freed 640-byte region [ffff88802e08ed00, ffff88802e08ef80)
      
      As commit f8556919 ("xfrm6: Fix the nexthdr offset in
      _decode_session6.") showed, xfrm_decode_session was originally intended
      only for the receive path. IP6CB(skb)->nhoff is not set during
      transmission. Therefore, set the cb field in the skb to 0 before
      sending packets.
      
      Fixes: f8556919
      
       ("xfrm6: Fix the nexthdr offset in _decode_session6.")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      55ad2309
    • Zhengchao Shao's avatar
      xfrm: fix slab-use-after-free in decode_session6 · 0d27567f
      Zhengchao Shao authored
      [ Upstream commit 53223f2e ]
      
      When the xfrm device is set to the qdisc of the sfb type, the cb field
      of the sent skb may be modified during enqueuing. Then,
      slab-use-after-free may occur when the xfrm device sends IPv6 packets.
      
      The stack information is as follows:
      BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
      Read of size 1 at addr ffff8881111458ef by task swapper/3/0
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
      Call Trace:
      <IRQ>
      dump_stack_lvl+0xd9/0x150
      print_address_description.constprop.0+0x2c/0x3c0
      kasan_report+0x11d/0x130
      decode_session6+0x103f/0x1890
      __xfrm_decode_session+0x54/0xb0
      xfrmi_xmit+0x173/0x1ca0
      dev_hard_start_xmit+0x187/0x700
      sch_direct_xmit+0x1a3/0xc30
      __qdisc_run+0x510/0x17a0
      __dev_queue_xmit+0x2215/0x3b10
      neigh_connected_output+0x3c2/0x550
      ip6_finish_output2+0x55a/0x1550
      ip6_finish_output+0x6b9/0x1270
      ip6_output+0x1f1/0x540
      ndisc_send_skb+0xa63/0x1890
      ndisc_send_rs+0x132/0x6f0
      addrconf_rs_timer+0x3f1/0x870
      call_timer_fn+0x1a0/0x580
      expire_timers+0x29b/0x4b0
      run_timer_softirq+0x326/0x910
      __do_softirq+0x1d4/0x905
      irq_exit_rcu+0xb7/0x120
      sysvec_apic_timer_interrupt+0x97/0xc0
      </IRQ>
      <TASK>
      asm_sysvec_apic_timer_interrupt+0x1a/0x20
      RIP: 0010:intel_idle_hlt+0x23/0x30
      Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4
      RSP: 0018:ffffc90000197d78 EFLAGS: 00000246
      RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5
      RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50
      RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d
      R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001
      R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000
      cpuidle_enter_state+0xd3/0x6f0
      cpuidle_enter+0x4e/0xa0
      do_idle+0x2fe/0x3c0
      cpu_startup_entry+0x18/0x20
      start_secondary+0x200/0x290
      secondary_startup_64_no_verify+0x167/0x16b
      </TASK>
      Allocated by task 939:
      kasan_save_stack+0x22/0x40
      kasan_set_track+0x25/0x30
      __kasan_slab_alloc+0x7f/0x90
      kmem_cache_alloc_node+0x1cd/0x410
      kmalloc_reserve+0x165/0x270
      __alloc_skb+0x129/0x330
      inet6_ifa_notify+0x118/0x230
      __ipv6_ifa_notify+0x177/0xbe0
      addrconf_dad_completed+0x133/0xe00
      addrconf_dad_work+0x764/0x1390
      process_one_work+0xa32/0x16f0
      worker_thread+0x67d/0x10c0
      kthread+0x344/0x440
      ret_from_fork+0x1f/0x30
      The buggy address belongs to the object at ffff888111145800
      which belongs to the cache skbuff_small_head of size 640
      The buggy address is located 239 bytes inside of
      freed 640-byte region [ffff888111145800, ffff888111145a80)
      
      As commit f8556919 ("xfrm6: Fix the nexthdr offset in
      _decode_session6.") showed, xfrm_decode_session was originally intended
      only for the receive path. IP6CB(skb)->nhoff is not set during
      transmission. Therefore, set the cb field in the skb to 0 before
      sending packets.
      
      Fixes: f8556919
      
       ("xfrm6: Fix the nexthdr offset in _decode_session6.")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0d27567f
    • Lin Ma's avatar
      net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure · 71dfe71d
      Lin Ma authored
      [ Upstream commit d1e0e61d ]
      
      According to all consumers code of attrs[XFRMA_SEC_CTX], like
      
      * verify_sec_ctx_len(), convert to xfrm_user_sec_ctx*
      * xfrm_state_construct(), call security_xfrm_state_alloc whose prototype
      is int security_xfrm_state_alloc(.., struct xfrm_user_sec_ctx *sec_ctx);
      * copy_from_user_sec_ctx(), convert to xfrm_user_sec_ctx *
      ...
      
      It seems that the expected parsing result for XFRMA_SEC_CTX should be
      structure xfrm_user_sec_ctx, and the current xfrm_sec_ctx is confusing
      and misleading (Luckily, they happen to have same size 8 bytes).
      
      This commit amend the policy structure to xfrm_user_sec_ctx to avoid
      ambiguity.
      
      Fixes: cf5cb79f
      
       ("[XFRM] netlink: Establish an attribute policy")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      71dfe71d
    • Lin Ma's avatar
      net: af_key: fix sadb_x_filter validation · 479884b4
      Lin Ma authored
      [ Upstream commit 75065a89 ]
      
      When running xfrm_state_walk_init(), the xfrm_address_filter being used
      is okay to have a splen/dplen that equals to sizeof(xfrm_address_t)<<3.
      This commit replaces >= to > to make sure the boundary checking is
      correct.
      
      Fixes: 37bd2242
      
       ("af_key: pfkey_dump needs parameter validation")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      479884b4
    • Lin Ma's avatar
      net: xfrm: Fix xfrm_address_filter OOB read · 9a005627
      Lin Ma authored
      [ Upstream commit dfa73c17 ]
      
      We found below OOB crash:
      
      [   44.211730] ==================================================================
      [   44.212045] BUG: KASAN: slab-out-of-bounds in memcmp+0x8b/0xb0
      [   44.212045] Read of size 8 at addr ffff88800870f320 by task poc.xfrm/97
      [   44.212045]
      [   44.212045] CPU: 0 PID: 97 Comm: poc.xfrm Not tainted 6.4.0-rc7-00072-gdad9774deaf1-dirty #4
      [   44.212045] Call Trace:
      [   44.212045]  <TASK>
      [   44.212045]  dump_stack_lvl+0x37/0x50
      [   44.212045]  print_report+0xcc/0x620
      [   44.212045]  ? __virt_addr_valid+0xf3/0x170
      [   44.212045]  ? memcmp+0x8b/0xb0
      [   44.212045]  kasan_report+0xb2/0xe0
      [   44.212045]  ? memcmp+0x8b/0xb0
      [   44.212045]  kasan_check_range+0x39/0x1c0
      [   44.212045]  memcmp+0x8b/0xb0
      [   44.212045]  xfrm_state_walk+0x21c/0x420
      [   44.212045]  ? __pfx_dump_one_state+0x10/0x10
      [   44.212045]  xfrm_dump_sa+0x1e2/0x290
      [   44.212045]  ? __pfx_xfrm_dump_sa+0x10/0x10
      [   44.212045]  ? __kernel_text_address+0xd/0x40
      [   44.212045]  ? kasan_unpoison+0x27/0x60
      [   44.212045]  ? mutex_lock+0x60/0xe0
      [   44.212045]  ? __pfx_mutex_lock+0x10/0x10
      [   44.212045]  ? kasan_save_stack+0x22/0x50
      [   44.212045]  netlink_dump+0x322/0x6c0
      [   44.212045]  ? __pfx_netlink_dump+0x10/0x10
      [   44.212045]  ? mutex_unlock+0x7f/0xd0
      [   44.212045]  ? __pfx_mutex_unlock+0x10/0x10
      [   44.212045]  __netlink_dump_start+0x353/0x430
      [   44.212045]  xfrm_user_rcv_msg+0x3a4/0x410
      [   44.212045]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
      [   44.212045]  ? __pfx_xfrm_user_rcv_msg+0x10/0x10
      [   44.212045]  ? __pfx_xfrm_dump_sa+0x10/0x10
      [   44.212045]  ? __pfx_xfrm_dump_sa_done+0x10/0x10
      [   44.212045]  ? __stack_depot_save+0x382/0x4e0
      [   44.212045]  ? filter_irq_stacks+0x1c/0x70
      [   44.212045]  ? kasan_save_stack+0x32/0x50
      [   44.212045]  ? kasan_save_stack+0x22/0x50
      [   44.212045]  ? kasan_set_track+0x25/0x30
      [   44.212045]  ? __kasan_slab_alloc+0x59/0x70
      [   44.212045]  ? kmem_cache_alloc_node+0xf7/0x260
      [   44.212045]  ? kmalloc_reserve+0xab/0x120
      [   44.212045]  ? __alloc_skb+0xcf/0x210
      [   44.212045]  ? netlink_sendmsg+0x509/0x700
      [   44.212045]  ? sock_sendmsg+0xde/0xe0
      [   44.212045]  ? __sys_sendto+0x18d/0x230
      [   44.212045]  ? __x64_sys_sendto+0x71/0x90
      [   44.212045]  ? do_syscall_64+0x3f/0x90
      [   44.212045]  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045]  ? netlink_sendmsg+0x509/0x700
      [   44.212045]  ? sock_sendmsg+0xde/0xe0
      [   44.212045]  ? __sys_sendto+0x18d/0x230
      [   44.212045]  ? __x64_sys_sendto+0x71/0x90
      [   44.212045]  ? do_syscall_64+0x3f/0x90
      [   44.212045]  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045]  ? kasan_save_stack+0x22/0x50
      [   44.212045]  ? kasan_set_track+0x25/0x30
      [   44.212045]  ? kasan_save_free_info+0x2e/0x50
      [   44.212045]  ? __kasan_slab_free+0x10a/0x190
      [   44.212045]  ? kmem_cache_free+0x9c/0x340
      [   44.212045]  ? netlink_recvmsg+0x23c/0x660
      [   44.212045]  ? sock_recvmsg+0xeb/0xf0
      [   44.212045]  ? __sys_recvfrom+0x13c/0x1f0
      [   44.212045]  ? __x64_sys_recvfrom+0x71/0x90
      [   44.212045]  ? do_syscall_64+0x3f/0x90
      [   44.212045]  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045]  ? copyout+0x3e/0x50
      [   44.212045]  netlink_rcv_skb+0xd6/0x210
      [   44.212045]  ? __pfx_xfrm_user_rcv_msg+0x10/0x10
      [   44.212045]  ? __pfx_netlink_rcv_skb+0x10/0x10
      [   44.212045]  ? __pfx_sock_has_perm+0x10/0x10
      [   44.212045]  ? mutex_lock+0x8d/0xe0
      [   44.212045]  ? __pfx_mutex_lock+0x10/0x10
      [   44.212045]  xfrm_netlink_rcv+0x44/0x50
      [   44.212045]  netlink_unicast+0x36f/0x4c0
      [   44.212045]  ? __pfx_netlink_unicast+0x10/0x10
      [   44.212045]  ? netlink_recvmsg+0x500/0x660
      [   44.212045]  netlink_sendmsg+0x3b7/0x700
      [   44.212045]  ? __pfx_netlink_sendmsg+0x10/0x10
      [   44.212045]  ? __pfx_netlink_sendmsg+0x10/0x10
      [   44.212045]  sock_sendmsg+0xde/0xe0
      [   44.212045]  __sys_sendto+0x18d/0x230
      [   44.212045]  ? __pfx___sys_sendto+0x10/0x10
      [   44.212045]  ? rcu_core+0x44a/0xe10
      [   44.212045]  ? __rseq_handle_notify_resume+0x45b/0x740
      [   44.212045]  ? _raw_spin_lock_irq+0x81/0xe0
      [   44.212045]  ? __pfx___rseq_handle_notify_resume+0x10/0x10
      [   44.212045]  ? __pfx_restore_fpregs_from_fpstate+0x10/0x10
      [   44.212045]  ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
      [   44.212045]  ? __pfx_task_work_run+0x10/0x10
      [   44.212045]  __x64_sys_sendto+0x71/0x90
      [   44.212045]  do_syscall_64+0x3f/0x90
      [   44.212045]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045] RIP: 0033:0x44b7da
      [   44.212045] RSP: 002b:00007ffdc8838548 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   44.212045] RAX: ffffffffffffffda RBX: 00007ffdc8839978 RCX: 000000000044b7da
      [   44.212045] RDX: 0000000000000038 RSI: 00007ffdc8838770 RDI: 0000000000000003
      [   44.212045] RBP: 00007ffdc88385b0 R08: 00007ffdc883858c R09: 000000000000000c
      [   44.212045] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
      [   44.212045] R13: 00007ffdc8839968 R14: 00000000004c37d0 R15: 0000000000000001
      [   44.212045]  </TASK>
      [   44.212045]
      [   44.212045] Allocated by task 97:
      [   44.212045]  kasan_save_stack+0x22/0x50
      [   44.212045]  kasan_set_track+0x25/0x30
      [   44.212045]  __kasan_kmalloc+0x7f/0x90
      [   44.212045]  __kmalloc_node_track_caller+0x5b/0x140
      [   44.212045]  kmemdup+0x21/0x50
      [   44.212045]  xfrm_dump_sa+0x17d/0x290
      [   44.212045]  netlink_dump+0x322/0x6c0
      [   44.212045]  __netlink_dump_start+0x353/0x430
      [   44.212045]  xfrm_user_rcv_msg+0x3a4/0x410
      [   44.212045]  netlink_rcv_skb+0xd6/0x210
      [   44.212045]  xfrm_netlink_rcv+0x44/0x50
      [   44.212045]  netlink_unicast+0x36f/0x4c0
      [   44.212045]  netlink_sendmsg+0x3b7/0x700
      [   44.212045]  sock_sendmsg+0xde/0xe0
      [   44.212045]  __sys_sendto+0x18d/0x230
      [   44.212045]  __x64_sys_sendto+0x71/0x90
      [   44.212045]  do_syscall_64+0x3f/0x90
      [   44.212045]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045]
      [   44.212045] The buggy address belongs to the object at ffff88800870f300
      [   44.212045]  which belongs to the cache kmalloc-64 of size 64
      [   44.212045] The buggy address is located 32 bytes inside of
      [   44.212045]  allocated 36-byte region [ffff88800870f300, ffff88800870f324)
      [   44.212045]
      [   44.212045] The buggy address belongs to the physical page:
      [   44.212045] page:00000000e4de16ee refcount:1 mapcount:0 mapping:000000000 ...
      [   44.212045] flags: 0x100000000000200(slab|node=0|zone=1)
      [   44.212045] page_type: 0xffffffff()
      [   44.212045] raw: 0100000000000200 ffff888004c41640 dead000000000122 0000000000000000
      [   44.212045] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
      [   44.212045] page dumped because: kasan: bad access detected
      [   44.212045]
      [   44.212045] Memory state around the buggy address:
      [   44.212045]  ffff88800870f200: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [   44.212045]  ffff88800870f280: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      [   44.212045] >ffff88800870f300: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
      [   44.212045]                                ^
      [   44.212045]  ffff88800870f380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   44.212045]  ffff88800870f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   44.212045] ==================================================================
      
      By investigating the code, we find the root cause of this OOB is the lack
      of checks in xfrm_dump_sa(). The buggy code allows a malicious user to pass
      arbitrary value of filter->splen/dplen. Hence, with crafted xfrm states,
      the attacker can achieve 8 bytes heap OOB read, which causes info leak.
      
        if (attrs[XFRMA_ADDRESS_FILTER]) {
          filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]),
              sizeof(*filter), GFP_KERNEL);
          if (filter == NULL)
            return -ENOMEM;
          // NO MORE CHECKS HERE !!!
        }
      
      This patch fixes the OOB by adding necessary boundary checks, just like
      the code in pfkey_dump() function.
      
      Fixes: d3623099
      
       ("ipsec: add support of limited SA dump")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9a005627
    • Tam Nguyen's avatar
      i2c: designware: Handle invalid SMBus block data response length value · 5a47c2fa
      Tam Nguyen authored
      commit 69f035c4
      
       upstream.
      
      In the I2C_FUNC_SMBUS_BLOCK_DATA case, the invalid length byte value
      (outside of 1-32) of the SMBus block data response from the Slave device
      is not correctly handled by the I2C Designware driver.
      
      In case IC_EMPTYFIFO_HOLD_MASTER_EN==1, which cannot be detected
      from the registers, the Master can be disabled only if the STOP bit
      is set. Without STOP bit set, the Master remains active, holding the bus
      until receiving a block data response length. This hangs the bus and
      is unrecoverable.
      
      Avoid this by issuing another dump read to reach the stop condition when
      an invalid length byte is received.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTam Nguyen <tamnguyenchi@os.amperecomputing.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Link: https://lore.kernel.org/r/20230726080001.337353-3-tamnguyenchi@os.amperecomputing.com
      
      
      Reviewed-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a47c2fa
    • Quan Nguyen's avatar
      i2c: designware: Correct length byte validation logic · 52114963
      Quan Nguyen authored
      commit 49d4db39 upstream.
      
      Commit 0daede80 ("i2c: designware: Convert driver to using regmap API")
      changes the logic to validate the whole 32-bit return value of
      DW_IC_DATA_CMD register instead of 8-bit LSB without reason.
      
      Later, commit f53f15ba ("i2c: designware: Get right data length"),
      introduced partial fix but not enough because the "tmp > 0" still test
      tmp as 32-bit value and is wrong in case the IC_DATA_CMD[11] is set.
      
      Revert the logic to just before commit 0daede80
      ("i2c: designware: Convert driver to using regmap API").
      
      Fixes: f53f15ba ("i2c: designware: Get right data length")
      Fixes: 0daede80
      
       ("i2c: designware: Convert driver to using regmap API")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTam Nguyen <tamnguyenchi@os.amperecomputing.com>
      Signed-off-by: default avatarQuan Nguyen <quan@os.amperecomputing.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Link: https://lore.kernel.org/r/20230726080001.337353-2-tamnguyenchi@os.amperecomputing.com
      
      
      Reviewed-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52114963
    • xiaoshoukui's avatar
      btrfs: fix BUG_ON condition in btrfs_cancel_balance · ceb9ba8e
      xiaoshoukui authored
      commit 29eefa6d
      
       upstream.
      
      Pausing and canceling balance can race to interrupt balance lead to BUG_ON
      panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance
      does not take this race scenario into account.
      
      However, the race condition has no other side effects. We can fix that.
      
      Reproducing it with panic trace like this:
      
        kernel BUG at fs/btrfs/volumes.c:4618!
        RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0
        Call Trace:
         <TASK>
         ? do_nanosleep+0x60/0x120
         ? hrtimer_nanosleep+0xb7/0x1a0
         ? sched_core_clone_cookie+0x70/0x70
         btrfs_ioctl_balance_ctl+0x55/0x70
         btrfs_ioctl+0xa46/0xd20
         __x64_sys_ioctl+0x7d/0xa0
         do_syscall_64+0x38/0x80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
        Race scenario as follows:
        > mutex_unlock(&fs_info->balance_mutex);
        > --------------------
        > .......issue pause and cancel req in another thread
        > --------------------
        > ret = __btrfs_balance(fs_info);
        >
        > mutex_lock(&fs_info->balance_mutex);
        > if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
        >         btrfs_info(fs_info, "balance: paused");
        >         btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
        > }
      
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarxiaoshoukui <xiaoshoukui@ruijie.com.cn>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ceb9ba8e
    • Josef Bacik's avatar
      btrfs: fix incorrect splitting in btrfs_drop_extent_map_range · 9f68e210
      Josef Bacik authored
      commit c962098c upstream.
      
      In production we were seeing a variety of WARN_ON()'s in the extent_map
      code, specifically in btrfs_drop_extent_map_range() when we have to call
      add_extent_mapping() for our second split.
      
      Consider the following extent map layout
      
      	PINNED
      	[0 16K)  [32K, 48K)
      
      and then we call btrfs_drop_extent_map_range for [0, 36K), with
      skip_pinned == true.  The initial loop will have
      
      	start = 0
      	end = 36K
      	len = 36K
      
      we will find the [0, 16k) extent, but since we are pinned we will skip
      it, which has this code
      
      	start = em_end;
      	if (end != (u64)-1)
      		len = start + len - em_end;
      
      em_end here is 16K, so now the values are
      
      	start = 16K
      	len = 16K + 36K - 16K = 36K
      
      len should instead be 20K.  This is a problem when we find the next
      extent at [32K, 48K), we need to split this extent to leave [36K, 48k),
      however the code for the split looks like this
      
      	split->start = start + len;
      	split->len = em_end - (start + len);
      
      In this case we have
      
      	em_end = 48K
      	split->start = 16K + 36K       // this should be 16K + 20K
      	split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K
      
      and now we have an invalid extent_map in the tree that potentially
      overlaps other entries in the extent map.  Even in the non-overlapping
      case we will have split->start set improperly, which will cause problems
      with any block related calculations.
      
      We don't actually need len in this loop, we can simply use end as our
      end point, and only adjust start up when we find a pinned extent we need
      to skip.
      
      Adjust the logic to do this, which keeps us from inserting an invalid
      extent map.
      
      We only skip_pinned in the relocation case, so this is relatively rare,
      except in the case where you are running relocation a lot, which can
      happen with auto relocation on.
      
      Fixes: 55ef6899
      
       ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f68e210
    • Sherry Sun's avatar
      tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms · 0693c8f1
      Sherry Sun authored
      commit 28206984 upstream.
      
      Do not read the data register to clear the error flags for lpuart32
      platforms, the additional read may cause the receive FIFO underflow
      since the DMA has already read the data register.
      Actually all lpuart32 platforms support write 1 to clear those error
      bits, let's use this method to better clear the error flags.
      
      Fixes: 42b68768
      
       ("serial: fsl_lpuart: DMA support for 32-bit variant")
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarSherry Sun <sherry.sun@nxp.com>
      Link: https://lore.kernel.org/r/20230801022304.24251-1-sherry.sun@nxp.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0693c8f1
    • Yi Yang's avatar
      tty: n_gsm: fix the UAF caused by race condition in gsm_cleanup_mux · 31311a9a
      Yi Yang authored
      commit 3c4f8333 upstream.
      
      In commit 9b9c8195 ("tty: n_gsm: fix UAF in gsm_cleanup_mux"), the UAF
      problem is not completely fixed. There is a race condition in
      gsm_cleanup_mux(), which caused this UAF.
      
      The UAF problem is triggered by the following race:
      task[5046]                     task[5054]
      -----------------------        -----------------------
      gsm_cleanup_mux();
      dlci = gsm->dlci[0];
      mutex_lock(&gsm->mutex);
                                     gsm_cleanup_mux();
      			       dlci = gsm->dlci[0]; //Didn't take the lock
      gsm_dlci_release(gsm->dlci[i]);
      gsm->dlci[i] = NULL;
      mutex_unlock(&gsm->mutex);
                                     mutex_lock(&gsm->mutex);
      			       dlci->dead = true; //UAF
      
      Fix it by assigning values after mutex_lock().
      
      Link: https://syzkaller.appspot.com/text?tag=CrashReport&x=176188b5a80000
      Cc: stable <stable@kernel.org>
      Fixes: 9b9c8195 ("tty: n_gsm: fix UAF in gsm_cleanup_mux")
      Fixes: aa371e96
      
       ("tty: n_gsm: fix restart handling via CLD command")
      Signed-off-by: default avatarYi Yang <yiyang13@huawei.com>
      Co-developed-by: default avatarQiumiao Zhang <zhangqiumiao1@huawei.com>
      Signed-off-by: default avatarQiumiao Zhang <zhangqiumiao1@huawei.com>
      Link: https://lore.kernel.org/r/20230811031121.153237-1-yiyang13@huawei.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31311a9a
    • Dragos Tatulea's avatar
      vdpa: Enable strict validation for netlinks ops · d6aa03bd
      Dragos Tatulea authored
      commit f46c1e16
      
       upstream.
      
      The previous patches added the missing nla policies that were required for
      validation to work.
      
      Now strict validation on netlink ops can be enabled. This patch does it.
      
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Cc: stable@vger.kernel.org
      Message-Id: <20230727175757.73988-9-dtatulea@nvidia.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6aa03bd
    • Lin Ma's avatar
      vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check · ff717094
      Lin Ma authored
      commit 5d6ba607 upstream.
      
      The vdpa_nl_policy structure is used to validate the nlattr when parsing
      the incoming nlmsg. It will ensure the attribute being described produces
      a valid nlattr pointer in info->attrs before entering into each handler
      in vdpa_nl_ops.
      
      That is to say, the missing part in vdpa_nl_policy may lead to illegal
      nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
      
      This patch adds the missing nla_policy for vdpa max vqp attr to avoid
      such bugs.
      
      Fixes: ad69dd0b
      
       ("vdpa: Introduce query of device config layout")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Cc: stable@vger.kernel.org
      Message-Id: <20230727175757.73988-7-dtatulea@nvidia.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff717094
    • Lin Ma's avatar
      vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check · 8ad9bc25
      Lin Ma authored
      commit b3003e1b upstream.
      
      The vdpa_nl_policy structure is used to validate the nlattr when parsing
      the incoming nlmsg. It will ensure the attribute being described produces
      a valid nlattr pointer in info->attrs before entering into each handler
      in vdpa_nl_ops.
      
      That is to say, the missing part in vdpa_nl_policy may lead to illegal
      nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
      
      This patch adds the missing nla_policy for vdpa queue index attr to avoid
      such bugs.
      
      Fixes: 13b00b13
      
       ("vdpa: Add support for querying vendor statistics")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Cc: stable@vger.kernelorg
      Message-Id: <20230727175757.73988-5-dtatulea@nvidia.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ad9bc25
    • Lin Ma's avatar
      vdpa: Add features attr to vdpa_nl_policy for nlattr length check · 44b508cc
      Lin Ma authored
      commit 79c86515 upstream.
      
      The vdpa_nl_policy structure is used to validate the nlattr when parsing
      the incoming nlmsg. It will ensure the attribute being described produces
      a valid nlattr pointer in info->attrs before entering into each handler
      in vdpa_nl_ops.
      
      That is to say, the missing part in vdpa_nl_policy may lead to illegal
      nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
      
      This patch adds the missing nla_policy for vdpa features attr to avoid
      such bugs.
      
      Fixes: 90fea5a8
      
       ("vdpa: device feature provisioning")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Cc: stable@vger.kernel.org
      Message-Id: <20230727175757.73988-3-dtatulea@nvidia.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44b508cc
    • Nathan Lynch's avatar
      powerpc/rtas_flash: allow user copy to flash block cache objects · b8fee83a
      Nathan Lynch authored
      commit 4f317597 upstream.
      
      With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the
      /proc/powerpc/rtas/firmware_update interface to prepare a system
      firmware update yields a BUG():
      
        kernel BUG at mm/usercopy.c:102!
        Oops: Exception in kernel mode, sig: 5 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ #2
        Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries
        NIP:  c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000
        REGS: c0000000148c76a0 TRAP: 0700   Not tainted  (6.5.0-rc3+)
        MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002242  XER: 0000000c
        CFAR: c0000000001fbd34 IRQMASK: 0
        [ ... GPRs omitted ... ]
        NIP usercopy_abort+0xa0/0xb0
        LR  usercopy_abort+0x9c/0xb0
        Call Trace:
          usercopy_abort+0x9c/0xb0 (unreliable)
          __check_heap_object+0x1b4/0x1d0
          __check_object_size+0x2d0/0x380
          rtas_flash_write+0xe4/0x250
          proc_reg_write+0xfc/0x160
          vfs_write+0xfc/0x4e0
          ksys_write+0x90/0x160
          system_call_exception+0x178/0x320
          system_call_common+0x160/0x2c4
      
      The blocks of the firmware image are copied directly from user memory
      to objects allocated from flash_block_cache, so flash_block_cache must
      be created using kmem_cache_create_usercopy() to mark it safe for user
      access.
      
      Fixes: 6d07d1cd
      
       ("usercopy: Restrict non-usercopy caches to size 0")
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      [mpe: Trim and indent oops]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8fee83a
    • Yuanjun Gong's avatar
      fbdev: mmp: fix value check in mmphw_probe() · 9fedcd07
      Yuanjun Gong authored
      commit 0872b2c0 upstream.
      
      in mmphw_probe(), check the return value of clk_prepare_enable()
      and return the error code if clk_prepare_enable() returns an
      unexpected value.
      
      Fixes: d63028c3
      
       ("video: mmp display controller support")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fedcd07
    • Parker Newman's avatar
      i2c: tegra: Fix i2c-tegra DMA config option processing · 3461e649
      Parker Newman authored
      commit 27ec43c7 upstream.
      
      Tegra processors prior to Tegra186 used APB DMA for I2C requiring
      CONFIG_TEGRA20_APB_DMA=y while Tegra186 and later use GPC DMA requiring
      CONFIG_TEGRA186_GPC_DMA=y.
      
      The check for if the processor uses APB DMA is inverted and so the wrong
      DMA config options are checked.
      
      This means if CONFIG_TEGRA20_APB_DMA=y but CONFIG_TEGRA186_GPC_DMA=n
      with a Tegra186 or later processor the driver will incorrectly think DMA is
      enabled and attempt to request DMA channels that will never be availible,
      leaving the driver in a perpetual EPROBE_DEFER state.
      
      Fixes: 48cb6356
      
       ("i2c: tegra: Add GPCDMA support")
      Signed-off-by: default avatarParker Newman <pnewman@connecttech.com>
      Acked-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Acked-by: default avatarAkhil R <akhilrajeev@nvidia.com>
      Link: https://lore.kernel.org/r/fcfcf9b3-c8c4-9b34-2ff8-cd60a3d490bd@connecttech.com
      
      
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3461e649
    • Yicong Yang's avatar
      i2c: hisi: Only handle the interrupt of the driver's transfer · ba249011
      Yicong Yang authored
      commit fff67c1b upstream.
      
      The controller may be shared with other port, for example the firmware.
      Handle the interrupt from other sources will cause crash since some
      data are not initialized. So only handle the interrupt of the driver's
      transfer and discard others.
      
      Fixes: d62fbdb9
      
       ("i2c: add support for HiSilicon I2C controller")
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Link: https://lore.kernel.org/r/20230801124625.63587-1-yangyicong@huawei.com
      
      
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba249011
    • Chengfeng Ye's avatar
      i2c: bcm-iproc: Fix bcm_iproc_i2c_isr deadlock issue · db0416c1
      Chengfeng Ye authored
      commit 4caf4cb1 upstream.
      
      iproc_i2c_rd_reg() and iproc_i2c_wr_reg() are called from both
      interrupt context (e.g. bcm_iproc_i2c_isr) and process context
      (e.g. bcm_iproc_i2c_suspend). Therefore, interrupts should be
      disabled to avoid potential deadlock. To prevent this scenario,
      use spin_lock_irqsave().
      
      Fixes: 9a103872
      
       ("i2c: iproc: add NIC I2C support")
      Signed-off-by: default avatarChengfeng Ye <dg573847474@gmail.com>
      Acked-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db0416c1
    • Steve French's avatar
      cifs: fix potential oops in cifs_oplock_break · 5ee28bcf
      Steve French authored
      [ Upstream commit e8f5f849 ]
      
      With deferred close we can have closes that race with lease breaks,
      and so with the current checks for whether to send the lease response,
      oplock_response(), this can mean that an unmount (kill_sb) can occur
      just before we were checking if the tcon->ses is valid.  See below:
      
      [Fri Aug  4 04:12:50 2023] RIP: 0010:cifs_oplock_break+0x1f7/0x5b0 [cifs]
      [Fri Aug  4 04:12:50 2023] Code: 7d a8 48 8b 7d c0 c0 e9 02 48 89 45 b8 41 89 cf e8 3e f5 ff ff 4c 89 f7 41 83 e7 01 e8 82 b3 03 f2 49 8b 45 50 48 85 c0 74 5e <48> 83 78 60 00 74 57 45 84 ff 75 52 48 8b 43 98 48 83 eb 68 48 39
      [Fri Aug  4 04:12:50 2023] RSP: 0018:ffffb30607ddbdf8 EFLAGS: 00010206
      [Fri Aug  4 04:12:50 2023] RAX: 632d223d32612022 RBX: ffff97136944b1e0 RCX: 0000000080100009
      [Fri Aug  4 04:12:50 2023] RDX: 0000000000000001 RSI: 0000000080100009 RDI: ffff97136944b188
      [Fri Aug  4 04:12:50 2023] RBP: ffffb30607ddbe58 R08: 0000000000000001 R09: ffffffffc08e0900
      [Fri Aug  4 04:12:50 2023] R10: 0000000000000001 R11: 000000000000000f R12: ffff97136944b138
      [Fri Aug  4 04:12:50 2023] R13: ffff97149147c000 R14: ffff97136944b188 R15: 0000000000000000
      [Fri Aug  4 04:12:50 2023] FS:  0000000000000000(0000) GS:ffff9714f7c00000(0000) knlGS:0000000000000000
      [Fri Aug  4 04:12:50 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [Fri Aug  4 04:12:50 2023] CR2: 00007fd8de9c7590 CR3: 000000011228e000 CR4: 0000000000350ef0
      [Fri Aug  4 04:12:50 2023] Call Trace:
      [Fri Aug  4 04:12:50 2023]  <TASK>
      [Fri Aug  4 04:12:50 2023]  process_one_work+0x225/0x3d0
      [Fri Aug  4 04:12:50 2023]  worker_thread+0x4d/0x3e0
      [Fri Aug  4 04:12:50 2023]  ? process_one_work+0x3d0/0x3d0
      [Fri Aug  4 04:12:50 2023]  kthread+0x12a/0x150
      [Fri Aug  4 04:12:50 2023]  ? set_kthread_struct+0x50/0x50
      [Fri Aug  4 04:12:50 2023]  ret_from_fork+0x22/0x30
      [Fri Aug  4 04:12:50 2023]  </TASK>
      
      To fix this change the ordering of the checks before sending the oplock_response
      to first check if the openFileList is empty.
      
      Fixes: da787d5b
      
       ("SMB3: Do not send lease break acknowledgment if all file handles have been closed")
      Suggested-by: default avatarBharath SM <bharathsm@microsoft.com>
      Reviewed-by: default avatarBharath SM <bharathsm@microsoft.com>
      Reviewed-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarPaulo Alcantara (SUSE) <pc@manguebit.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5ee28bcf
    • Eugenio Pérez's avatar
      vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary · cba26abc
      Eugenio Pérez authored
      [ Upstream commit ad03a0f4 ]
      
      mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after
      the control virtqueue ASID iotlb has been populated. The control vq
      iotlb must not be cleared, since it will not be populated again.
      
      So call the ASID aware destroy function which makes sure that the
      right vq resource is destroyed.
      
      Fixes: 8fcd20c3
      
       ("vdpa/mlx5: Support different address spaces for control and data")
      Signed-off-by: default avatarEugenio Pérez <eperezma@redhat.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cba26abc
    • Dragos Tatulea's avatar
      vdpa/mlx5: Fix mr->initialized semantics · bb4983ec
      Dragos Tatulea authored
      [ Upstream commit 9ee81100 ]
      
      The mr->initialized flag is shared between the control vq and data vq
      part of the mr init/uninit. But if the control vq and data vq get placed
      in different ASIDs, it can happen that initializing the control vq will
      prevent the data vq mr from being initialized.
      
      This patch consolidates the control and data vq init parts into their
      own init functions. The mr->initialized will now be used for the data vq
      only. The control vq currently doesn't need a flag.
      
      The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
      split into data and control vq functions which are now also ASID aware.
      
      Fixes: 8fcd20c3
      
       ("vdpa/mlx5: Support different address spaces for control and data")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarEugenio Pérez <eperezma@redhat.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bb4983ec
    • Maxime Coquelin's avatar
      vduse: Use proper spinlock for IRQ injection · e706675b
      Maxime Coquelin authored
      [ Upstream commit 7ca26efb ]
      
      The IRQ injection work used spin_lock_irq() to protect the
      scheduling of the softirq, but spin_lock_bh() should be
      used.
      
      With spin_lock_irq(), we noticed delay of more than 6
      seconds between the time a NAPI polling work is scheduled
      and the time it is executed.
      
      Fixes: c8a6153b
      
       ("vduse: Introduce VDUSE - vDPA Device in Userspace")
      Cc: xieyongji@bytedance.com
      
      Suggested-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMaxime Coquelin <maxime.coquelin@redhat.com>
      Message-Id: <20230705114505.63274-1-maxime.coquelin@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarXie Yongji <xieyongji@bytedance.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e706675b
    • Wolfram Sang's avatar
      virtio-mmio: don't break lifecycle of vm_dev · af5818c3
      Wolfram Sang authored
      [ Upstream commit 55c91fed ]
      
      vm_dev has a separate lifecycle because it has a 'struct device'
      embedded. Thus, having a release callback for it is correct.
      
      Allocating the vm_dev struct with devres totally breaks this protection,
      though. Instead of waiting for the vm_dev release callback, the memory
      is freed when the platform_device is removed. Resulting in a
      use-after-free when finally the callback is to be called.
      
      To easily see the problem, compile the kernel with
      CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs.
      
      The fix is easy, don't use devres in this case.
      
      Found during my research about object lifetime problems.
      
      Fixes: 7eb781b1
      
       ("virtio_mmio: add cleanup for virtio_mmio_probe")
      Signed-off-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      af5818c3
    • Filipe Manana's avatar
      btrfs: fix use-after-free of new block group that became unused · 6297644d
      Filipe Manana authored
      [ Upstream commit 0657b20c ]
      
      If a task creates a new block group and that block group becomes unused
      before we finish its creation, at btrfs_create_pending_block_groups(),
      then when btrfs_mark_bg_unused() is called against the block group, we
      assume that the block group is currently in the list of block groups to
      reclaim, and we move it out of the list of new block groups and into the
      list of unused block groups. This has two consequences:
      
      1) We move it out of the list of new block groups associated to the
         current transaction. So the block group creation is not finished and
         if we attempt to delete the bg because it's unused, we will not find
         the block group item in the extent tree (or the new block group tree),
         its device extent items in the device tree etc, resulting in the
         deletion to fail due to the missing items;
      
      2) We don't increment the reference count on the block group when we
         move it to the list of unused block groups, because we assumed the
         block group was on the list of block groups to reclaim, and in that
         case it already has the correct reference count. However the block
         group was on the list of new block groups, in which case no extra
         reference was taken because it's local to the current task. This
         later results in doing an extra reference count decrement when
         removing the block group from the unused list, eventually leading the
         reference count to 0.
      
      This second case was caught when running generic/297 from fstests, which
      produced the following assertion failure and stack trace:
      
        [589.559] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4299
        [589.559] ------------[ cut here ]------------
        [589.559] kernel BUG at fs/btrfs/block-group.c:4299!
        [589.560] invalid opcode: 0000 [#1] PREEMPT SMP PTI
        [589.560] CPU: 8 PID: 2819134 Comm: umount Tainted: G        W          6.4.0-rc6-btrfs-next-134+ #1
        [589.560] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
        [589.560] RIP: 0010:btrfs_free_block_groups+0x449/0x4a0 [btrfs]
        [589.561] Code: 68 62 da c0 (...)
        [589.561] RSP: 0018:ffffa55a8c3b3d98 EFLAGS: 00010246
        [589.561] RAX: 0000000000000058 RBX: ffff8f030d7f2000 RCX: 0000000000000000
        [589.562] RDX: 0000000000000000 RSI: ffffffff953f0878 RDI: 00000000ffffffff
        [589.562] RBP: ffff8f030d7f2088 R08: 0000000000000000 R09: ffffa55a8c3b3c50
        [589.562] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8f05850b4c00
        [589.562] R13: ffff8f030d7f2090 R14: ffff8f05850b4cd8 R15: dead000000000100
        [589.563] FS:  00007f497fd2e840(0000) GS:ffff8f09dfc00000(0000) knlGS:0000000000000000
        [589.563] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [589.563] CR2: 00007f497ff8ec10 CR3: 0000000271472006 CR4: 0000000000370ee0
        [589.563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [589.564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [589.564] Call Trace:
        [589.564]  <TASK>
        [589.565]  ? __die_body+0x1b/0x60
        [589.565]  ? die+0x39/0x60
        [589.565]  ? do_trap+0xeb/0x110
        [589.565]  ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
        [589.566]  ? do_error_trap+0x6a/0x90
        [589.566]  ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
        [589.566]  ? exc_invalid_op+0x4e/0x70
        [589.566]  ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
        [589.567]  ? asm_exc_invalid_op+0x16/0x20
        [589.567]  ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
        [589.567]  ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
        [589.567]  close_ctree+0x35d/0x560 [btrfs]
        [589.568]  ? fsnotify_sb_delete+0x13e/0x1d0
        [589.568]  ? dispose_list+0x3a/0x50
        [589.568]  ? evict_inodes+0x151/0x1a0
        [589.568]  generic_shutdown_super+0x73/0x1a0
        [589.569]  kill_anon_super+0x14/0x30
        [589.569]  btrfs_kill_super+0x12/0x20 [btrfs]
        [589.569]  deactivate_locked_super+0x2e/0x70
        [589.569]  cleanup_mnt+0x104/0x160
        [589.570]  task_work_run+0x56/0x90
        [589.570]  exit_to_user_mode_prepare+0x160/0x170
        [589.570]  syscall_exit_to_user_mode+0x22/0x50
        [589.570]  ? __x64_sys_umount+0x12/0x20
        [589.571]  do_syscall_64+0x48/0x90
        [589.571]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [589.571] RIP: 0033:0x7f497ff0a567
        [589.571] Code: af 98 0e (...)
        [589.572] RSP: 002b:00007ffc98347358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [589.572] RAX: 0000000000000000 RBX: 00007f49800b8264 RCX: 00007f497ff0a567
        [589.572] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000557f558abfa0
        [589.573] RBP: 0000557f558a6ba0 R08: 0000000000000000 R09: 00007ffc98346100
        [589.573] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
        [589.573] R13: 0000557f558abfa0 R14: 0000557f558a6cb0 R15: 0000557f558a6dd0
        [589.573]  </TASK>
        [589.574] Modules linked in: dm_snapshot dm_thin_pool (...)
        [589.576] ---[ end trace 0000000000000000 ]---
      
      Fix this by adding a runtime flag to the block group to tell that the
      block group is still in the list of new block groups, and therefore it
      should not be moved to the list of unused block groups, at
      btrfs_mark_bg_unused(), until the flag is cleared, when we finish the
      creation of the block group at btrfs_create_pending_block_groups().
      
      Fixes: a9f18971
      
       ("btrfs: move out now unused BG from the reclaim list")
      CC: stable@vger.kernel.org # 5.15+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6297644d
    • David Sterba's avatar
      btrfs: convert btrfs_block_group::seq_zone to runtime flag · 29cebf80
      David Sterba authored
      [ Upstream commit 961f5b8b
      
       ]
      
      In zoned mode the sequential status of zone can be also tracked in the
      runtime flags of block group.
      
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Stable-dep-of: 0657b20c
      
       ("btrfs: fix use-after-free of new block group that became unused")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      29cebf80
    • David Sterba's avatar
      btrfs: convert btrfs_block_group::needs_free_space to runtime flag · 94cde941
      David Sterba authored
      [ Upstream commit 0d7764ff
      
       ]
      
      We already have flags in block group to track various status bits,
      convert needs_free_space as well and reduce size of btrfs_block_group.
      
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Stable-dep-of: 0657b20c
      
       ("btrfs: fix use-after-free of new block group that became unused")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      94cde941
    • Naohiro Aota's avatar
      btrfs: move out now unused BG from the reclaim list · 01eca70e
      Naohiro Aota authored
      [ Upstream commit a9f18971 ]
      
      An unused block group is easy to remove to free up space and should be
      reclaimed fast. Such block group can often already be a target of the
      reclaim process. As we check list_empty(&bg->bg_list), we keep it in the
      reclaim list. That block group is never reclaimed until the file system
      is filled e.g. up to 75%.
      
      Instead, we can move unused block group to the unused list and delete it
      fast.
      
      Fixes: 18bb8bbf
      
       ("btrfs: zoned: automatically reclaim zones")
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      01eca70e
    • Daniel Vetter's avatar
      video/aperture: Only remove sysfb on the default vga pci device · 485ec8f8
      Daniel Vetter authored
      [ Upstream commit 5ae3716c ]
      
      Instead of calling aperture_remove_conflicting_devices() to remove the
      conflicting devices, just call to aperture_detach_devices() to detach
      the device that matches the same PCI BAR / aperture range. Since the
      former is just a wrapper of the latter plus a sysfb_disable() call,
      and now that's done in this function but only for the primary devices.
      
      This fixes a regression introduced by commit ee7a69aa ("fbdev:
      Disable sysfb device registration when removing conflicting FBs"),
      where we remove the sysfb when loading a driver for an unrelated pci
      device, resulting in the user losing their efifb console or similar.
      
      Note that in practice this only is a problem with the nvidia blob,
      because that's the only gpu driver people might install which does not
      come with an fbdev driver of it's own. For everyone else the real gpu
      driver will restore a working console.
      
      Also note that in the referenced bug there's confusion that this same
      bug also happens on amdgpu. But that was just another amdgpu specific
      regression, which just happened to happen at roughly the same time and
      with the same user-observable symptoms. That bug is fixed now, see
      https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
      
      Note that we should not have any such issues on non-pci multi-gpu
      issues, because I could only find two such cases:
      - SoC with some external panel over spi or similar. These panel
        drivers do not use drm_aperture_remove_conflicting_framebuffers(),
        so no problem.
      - vga+mga, which is a direct console driver and entirely bypasses all
        this.
      
      For the above reasons the cc: stable is just notionally, this patch
      will need a backport and that's up to nvidia if they care enough.
      
      v2:
      - Explain a bit better why other multi-gpu that aren't pci shouldn't
        have any issues with making all this fully pci specific.
      
      v3
      - polish commit message (Javier)
      
      v4:
      - Fix commit message style (i.e., commit 1234 ("..."))
      - fix Daniel's S-o-b address
      
      v5:
      - add back an S-o-b tag with Daniel's Intel address
      
      Fixes: ee7a69aa
      
       ("fbdev: Disable sysfb device registration when removing conflicting FBs")
      Tested-by: default avatarAaron Plattner <aplattner@nvidia.com>
      Reviewed-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
      
      
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Cc: Aaron Plattner <aplattner@nvidia.com>
      Cc: Javier Martinez Canillas <javierm@redhat.com>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport)
      Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-8-tzimmermann@suse.de
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      485ec8f8
    • Thomas Zimmermann's avatar
      fbdev/hyperv-fb: Do not set struct fb_info.apertures · f83ab817
      Thomas Zimmermann authored
      [ Upstream commit 81d23934
      
       ]
      
      Generic fbdev drivers use the apertures field in struct fb_info to
      control ownership of the framebuffer memory and graphics device. Do
      not set the values in hyperv-fb.
      
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Reviewed-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221219160516.23436-9-tzimmermann@suse.de
      Stable-dep-of: 5ae3716c
      
       ("video/aperture: Only remove sysfb on the default vga pci device")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f83ab817
    • Xu Yang's avatar
      ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node · e41170d1
      Xu Yang authored
      [ Upstream commit ee70b908 ]
      
      Property name "phy-3p0-supply" is used instead of "phy-reg_3p0-supply".
      
      Fixes: 9f30b6b1
      
       ("ARM: dts: imx: Add basic dtsi file for imx6sll")
      cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarXu Yang <xu.yang_2@nxp.com>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e41170d1
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption · 3d2d051b
      Marc Zyngier authored
      [ Upstream commit b321c31c ]
      
      Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
      running a preemptible kernel, as it is possible that a vCPU is blocked
      without requesting a doorbell interrupt.
      
      The issue is that any preemption that occurs between vgic_v4_put() and
      schedule() on the block path will mark the vPE as nonresident and *not*
      request a doorbell irq. This occurs because when the vcpu thread is
      resumed on its way to block, vcpu_load() will make the vPE resident
      again. Once the vcpu actually blocks, we don't request a doorbell
      anymore, and the vcpu won't be woken up on interrupt delivery.
      
      Fix it by tracking that we're entering WFI, and key the doorbell
      request on that flag. This allows us not to make the vPE resident
      when going through a preempt/schedule cycle, meaning we don't lose
      any state.
      
      Cc: stable@vger.kernel.org
      Fixes: 8e01d9a3
      
       ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put")
      Reported-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Suggested-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Tested-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Co-developed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org
      
      
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3d2d051b
    • Hersen Wu's avatar
      drm/amd/display: fix access hdcp_workqueue assert · 402f1d86
      Hersen Wu authored
      [ Upstream commit cdff36a0
      
       ]
      
      [Why] hdcp are enabled for asics from raven. for old asics
      which hdcp are not enabled, hdcp_workqueue are null. some
      access to hdcp work queue are not guarded with pointer check.
      
      [How] add hdcp_workqueue pointer check before access workqueue.
      
      Reviewed-by: default avatarBhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
      Acked-by: default avatarQingqing Zhuo <qingqing.zhuo@amd.com>
      Signed-off-by: default avatarHersen Wu <hersenxs.wu@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      402f1d86
    • hersen wu's avatar
      drm/amd/display: phase3 mst hdcp for multiple displays · 81e6cf44
      hersen wu authored
      [ Upstream commit e8fd3eeb
      
       ]
      
      [Why]
      multiple display hdcp are enabled within event_property_validate,
      event_property_update by looping all displays on mst hub. when
      one of display on mst hub in unplugged or disabled, hdcp are
      disabled for all displays on mst hub within hdcp_reset_display
      by looping all displays of mst link. for displays still active,
      their encryption status are off. kernel driver will not run hdcp
      authentication again. therefore, hdcp are not enabled automatically.
      
      [How]
      within is_content_protection_different, check drm_crtc_state changes
      of all displays on mst hub, if need, triger hdcp_update_display to
      re-run hdcp authentication.
      
      Acked-by: default avatarAurabindo Pillai <aurabindo.pillai@amd.com>
      Signed-off-by: default avatarhersen wu <hersenxs.wu@amd.com>
      Reviewed-by: default avatarBhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Stable-dep-of: cdff36a0
      
       ("drm/amd/display: fix access hdcp_workqueue assert")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      81e6cf44
    • hersen wu's avatar
      drm/amd/display: save restore hdcp state when display is unplugged from mst hub · d90f97cb
      hersen wu authored
      [ Upstream commit 82986fd6
      
       ]
      
      [Why]
      connector hdcp properties are lost after display is
      unplgged from mst hub. connector is destroyed with
      dm_dp_mst_connector_destroy. when display is plugged
      back, hdcp is not desired and it wouldnt be enabled.
      
      [How]
      save hdcp properties into hdcp_work within
      amdgpu_dm_atomic_commit_tail. If the same display is
      plugged back with same display index, its hdcp
      properties will be retrieved from hdcp_work within
      dm_dp_mst_get_modes.
      
      Acked-by: default avatarAurabindo Pillai <aurabindo.pillai@amd.com>
      Signed-off-by: default avatarhersen wu <hersenxs.wu@amd.com>
      Reviewed-by: default avatarBhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Stable-dep-of: cdff36a0
      
       ("drm/amd/display: fix access hdcp_workqueue assert")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d90f97cb
    • Song Yoong Siang's avatar
      igc: read before write to SRRCTL register · 48f0671b
      Song Yoong Siang authored
      [ Upstream commit 3ce29c17 ]
      
      igc_configure_rx_ring() function will be called as part of XDP program
      setup. If Rx hardware timestamp is enabled prio to XDP program setup,
      this timestamp enablement will be overwritten when buffer size is
      written into SRRCTL register.
      
      Thus, this commit read the register value before write to SRRCTL
      register. This commit is tested by using xdp_hw_metadata bpf selftest
      tool. The tool enables Rx hardware timestamp and then attach XDP program
      to igc driver. It will display hardware timestamp of UDP packet with
      port number 9092. Below are detail of test steps and results.
      
      Command on DUT:
        sudo ./xdp_hw_metadata <interface name>
      
      Command on Link Partner:
        echo -n skb | nc -u -q1 <destination IPv4 addr> 9092
      
      Result before this patch:
        skb hwtstamp is not found!
      
      Result after this patch:
        found skb hwtstamp = 1677800973.642836757
      
      Optionally, read PHC to confirm the values obtained are almost the same:
      Command:
        sudo ./testptp -d /dev/ptp0 -g
      Result:
        clock time: 1677800973.913598978 or Fri Mar  3 07:49:33 2023
      
      Fixes: fc9df2a0
      
       ("igc: Enable RX via AF_XDP zero-copy")
      Cc: <stable@vger.kernel.org> # 5.14+
      Signed-off-by: default avatarSong Yoong Siang <yoong.siang.song@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      48f0671b
    • Chen Lin's avatar
      ring-buffer: Do not swap cpu_buffer during resize process · 128c06a3
      Chen Lin authored
      [ Upstream commit 8a96c028 ]
      
      When ring_buffer_swap_cpu was called during resize process,
      the cpu buffer was swapped in the middle, resulting in incorrect state.
      Continuing to run in the wrong state will result in oops.
      
      This issue can be easily reproduced using the following two scripts:
      /tmp # cat test1.sh
      //#! /bin/sh
      for i in `seq 0 100000`
      do
               echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb
               sleep 0.5
               echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb
               sleep 0.5
      done
      /tmp # cat test2.sh
      //#! /bin/sh
      for i in `seq 0 100000`
      do
              echo irqsoff > /sys/kernel/debug/tracing/current_tracer
              sleep 1
              echo nop > /sys/kernel/debug/tracing/current_tracer
              sleep 1
      done
      /tmp # ./test1.sh &
      /tmp # ./test2.sh &
      
      A typical oops log is as follows, sometimes with other different oops logs.
      
      [  231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8
      [  231.713375] Modules linked in:
      [  231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 #15
      [  231.716750] Hardware name: linux,dummy-virt (DT)
      [  231.718152] Workqueue: events update_pages_handler
      [  231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [  231.721171] pc : rb_update_pages+0x378/0x3f8
      [  231.722212] lr : rb_update_pages+0x25c/0x3f8
      [  231.723248] sp : ffff800082b9bd50
      [  231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
      [  231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0
      [  231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a
      [  231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000
      [  231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510
      [  231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
      [  231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558
      [  231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001
      [  231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000
      [  231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208
      [  231.744196] Call trace:
      [  231.744892]  rb_update_pages+0x378/0x3f8
      [  231.745893]  update_pages_handler+0x1c/0x38
      [  231.746893]  process_one_work+0x1f0/0x468
      [  231.747852]  worker_thread+0x54/0x410
      [  231.748737]  kthread+0x124/0x138
      [  231.749549]  ret_from_fork+0x10/0x20
      [  231.750434] ---[ end trace 0000000000000000 ]---
      [  233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  233.721696] Mem abort info:
      [  233.721935]   ESR = 0x0000000096000004
      [  233.722283]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  233.722596]   SET = 0, FnV = 0
      [  233.722805]   EA = 0, S1PTW = 0
      [  233.723026]   FSC = 0x04: level 0 translation fault
      [  233.723458] Data abort info:
      [  233.723734]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
      [  233.724176]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      [  233.724589]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [  233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000
      [  233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
      [  233.726231] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
      [  233.726720] Modules linked in:
      [  233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 #15
      [  233.727777] Hardware name: linux,dummy-virt (DT)
      [  233.728225] Workqueue: events update_pages_handler
      [  233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [  233.729054] pc : rb_update_pages+0x1a8/0x3f8
      [  233.729334] lr : rb_update_pages+0x154/0x3f8
      [  233.729592] sp : ffff800082b9bd50
      [  233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
      [  233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418
      [  233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003
      [  233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58
      [  233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001
      [  233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000
      [  233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c
      [  233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0
      [  233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000
      [  233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000
      [  233.734418] Call trace:
      [  233.734593]  rb_update_pages+0x1a8/0x3f8
      [  233.734853]  update_pages_handler+0x1c/0x38
      [  233.735148]  process_one_work+0x1f0/0x468
      [  233.735525]  worker_thread+0x54/0x410
      [  233.735852]  kthread+0x124/0x138
      [  233.736064]  ret_from_fork+0x10/0x20
      [  233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060)
      [  233.736959] ---[ end trace 0000000000000000 ]---
      
      After analysis, the seq of the error is as follows [1-5]:
      
      int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
      			int cpu_id)
      {
      	for_each_buffer_cpu(buffer, cpu) {
      		cpu_buffer = buffer->buffers[cpu];
      		//1. get cpu_buffer, aka cpu_buffer(A)
      		...
      		...
      		schedule_work_on(cpu,
      		 &cpu_buffer->update_pages_work);
      		//2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to
      		// update_pages_handler, do the update process, set 'update_done' in
      		// complete(&cpu_buffer->update_done) and to wakeup resize process.
      	//---->
      		//3. Just at this moment, ring_buffer_swap_cpu is triggered,
      		//cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer.
      		//ring_buffer_swap_cpu is called as the 'Call trace' below.
      
      		Call trace:
      		 dump_backtrace+0x0/0x2f8
      		 show_stack+0x18/0x28
      		 dump_stack+0x12c/0x188
      		 ring_buffer_swap_cpu+0x2f8/0x328
      		 update_max_tr_single+0x180/0x210
      		 check_critical_timing+0x2b4/0x2c8
      		 tracer_hardirqs_on+0x1c0/0x200
      		 trace_hardirqs_on+0xec/0x378
      		 el0_svc_common+0x64/0x260
      		 do_el0_svc+0x90/0xf8
      		 el0_svc+0x20/0x30
      		 el0_sync_handler+0xb0/0xb8
      		 el0_sync+0x180/0x1c0
      	//<----
      
      	/* wait for all the updates to complete */
      	for_each_buffer_cpu(buffer, cpu) {
      		cpu_buffer = buffer->buffers[cpu];
      		//4. get cpu_buffer, cpu_buffer(B) is used in the following process,
      		//the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong.
      		//for example, cpu_buffer(A)->update_done will leave be set 1, and will
      		//not 'wait_for_completion' at the next resize round.
      		  if (!cpu_buffer->nr_pages_to_update)
      			continue;
      
      		if (cpu_online(cpu))
      			wait_for_completion(&cpu_buffer->update_done);
      		cpu_buffer->nr_pages_to_update = 0;
      	}
      	...
      }
      	//5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong,
      	//Continuing to run in the wrong state, then oops occurs.
      
      Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cn
      
      
      
      Signed-off-by: default avatarChen Lin <chen.lin5@zte.com.cn>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      128c06a3