Skip to content
  1. Aug 23, 2023
    • Russell King (Oracle)'s avatar
      net: phy: fix IRQ-based wake-on-lan over hibernate / power off · 59f3d919
      Russell King (Oracle) authored
      [ Upstream commit cc941e54 ]
      
      Uwe reports:
      "Most PHYs signal WoL using an interrupt. So disabling interrupts [at
      shutdown] breaks WoL at least on PHYs covered by the marvell driver."
      
      Discussing with Ioana, the problem which was trying to be solved was:
      "The board in question is a LS1021ATSN which has two AR8031 PHYs that
      share an interrupt line. In case only one of the PHYs is probed and
      there are pending interrupts on the PHY#2 an IRQ storm will happen
      since there is no entity to clear the interrupt from PHY#2's registers.
      PHY#1's driver will get stuck in .handle_interrupt() indefinitely."
      
      Further confirmation that "the two AR8031 PHYs are on the same MDIO
      bus."
      
      With WoL using interrupts to wake the system, in such a case, the
      system will begin booting with an asserted interrupt. Thus, we need to
      cope with an interrupt asserted during boot.
      
      Solve this instead by disabling interrupts during PHY probe. This will
      ensure in Ioana's situation that both PHYs of the same type sharing an
      interrupt line on a common MDIO bus will have their interrupt outputs
      disabled when the driver probes the device, but before we hook in any
      interrupt handlers - thus avoiding the interrupt storm.
      
      A better fix would be for platform firmware to disable the interrupting
      devices at source during boot, before control is handed to the kernel.
      
      Fixes: e2f016cf
      
       ("net: phy: add a shutdown procedure")
      Link: 20230804071757.383971-1-u.kleine-koenig@pengutronix.de
      Reported-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      59f3d919
    • Xiang Yang's avatar
      net: pcs: Add missing put_device call in miic_create · 2361c766
      Xiang Yang authored
      [ Upstream commit 829c6524 ]
      
      The reference of pdev->dev is taken by of_find_device_by_node, so
      it should be released when not need anymore.
      
      Fixes: 7dc54d3b
      
       ("net: pcs: add Renesas MII converter driver")
      Signed-off-by: default avatarXiang Yang <xiangyang3@huawei.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2361c766
    • Jason Wang's avatar
      virtio-net: set queues after driver_ok · 3c8608fb
      Jason Wang authored
      [ Upstream commit 51b81317 ]
      
      Commit 25266128 ("virtio-net: fix race between set queues and
      probe") tries to fix the race between set queues and probe by calling
      _virtnet_set_queues() before DRIVER_OK is set. This violates virtio
      spec. Fixing this by setting queues after virtio_device_ready().
      
      Note that rtnl needs to be held for userspace requests to change the
      number of queues. So we are serialized in this way.
      
      Fixes: 25266128
      
       ("virtio-net: fix race between set queues and probe")
      Reported-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c8608fb
    • Leon Romanovsky's avatar
      xfrm: don't skip free of empty state in acquire policy · c8ce01aa
      Leon Romanovsky authored
      [ Upstream commit f3ec2b5d ]
      
      In destruction flow, the assignment of NULL to xso->dev
      caused to skip of xfrm_dev_state_free() call, which was
      called in xfrm_state_put(to_put) routine.
      
      Instead of open-coded variant of xfrm_dev_state_delete() and
      xfrm_dev_state_free(), let's use them directly.
      
      Fixes: f8a70afa
      
       ("xfrm: add TX datapath support for IPsec packet offload mode")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c8ce01aa
    • Leon Romanovsky's avatar
      xfrm: delete offloaded policy · 757eaa5d
      Leon Romanovsky authored
      [ Upstream commit 982c3aca ]
      
      The policy memory was released but not HW driver data. Add
      call to xfrm_dev_policy_delete(), so drivers will have a chance
      to release their resources.
      
      Fixes: 919e43fa
      
       ("xfrm: add an interface to offload policy")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      757eaa5d
    • Lin Ma's avatar
      xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH · a9020514
      Lin Ma authored
      [ Upstream commit 5e242470 ]
      
      The previous commit 4e484b3e ("xfrm: rate limit SA mapping change
      message to user space") added one additional attribute named
      XFRMA_MTIMER_THRESH and described its type at compat_policy
      (net/xfrm/xfrm_compat.c).
      
      However, the author forgot to also describe the nla_policy at
      xfrma_policy (net/xfrm/xfrm_user.c). Hence, this suppose NLA_U32 (4
      bytes) value can be faked as empty (0 bytes) by a malicious user, which
      leads to 4 bytes overflow read and heap information leak when parsing
      nlattrs.
      
      To exploit this, one malicious user can spray the SLUB objects and then
      leverage this 4 bytes OOB read to leak the heap data into
      x->mapping_maxage (see xfrm_update_ae_params(...)), and leak it to
      userspace via copy_to_user_state_extra(...).
      
      The above bug is assigned CVE-2023-3773. To fix it, this commit just
      completes the nla_policy description for XFRMA_MTIMER_THRESH, which
      enforces the length check and avoids such OOB read.
      
      Fixes: 4e484b3e
      
       ("xfrm: rate limit SA mapping change message to user space")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a9020514
    • Lin Ma's avatar
      xfrm: add NULL check in xfrm_update_ae_params · 53df4be4
      Lin Ma authored
      [ Upstream commit 00374d9b ]
      
      Normally, x->replay_esn and x->preplay_esn should be allocated at
      xfrm_alloc_replay_state_esn(...) in xfrm_state_construct(...), hence the
      xfrm_update_ae_params(...) is okay to update them. However, the current
      implementation of xfrm_new_ae(...) allows a malicious user to directly
      dereference a NULL pointer and crash the kernel like below.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 8253067 P4D 8253067 PUD 8e0e067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
      CPU: 0 PID: 98 Comm: poc.npd Not tainted 6.4.0-rc7-00072-gdad9774deaf1 #8
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.o4
      RIP: 0010:memcpy_orig+0xad/0x140
      Code: e8 4c 89 5f e0 48 8d 7f e0 73 d2 83 c2 20 48 29 d6 48 29 d7 83 fa 10 72 34 4c 8b 06 4c 8b 4e 08 c
      RSP: 0018:ffff888008f57658 EFLAGS: 00000202
      RAX: 0000000000000000 RBX: ffff888008bd0000 RCX: ffffffff8238e571
      RDX: 0000000000000018 RSI: ffff888007f64844 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff888008f57818
      R13: ffff888007f64aa4 R14: 0000000000000000 R15: 0000000000000000
      FS:  00000000014013c0(0000) GS:ffff88806d600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000054d8000 CR4: 00000000000006f0
      Call Trace:
       <TASK>
       ? __die+0x1f/0x70
       ? page_fault_oops+0x1e8/0x500
       ? __pfx_is_prefetch.constprop.0+0x10/0x10
       ? __pfx_page_fault_oops+0x10/0x10
       ? _raw_spin_unlock_irqrestore+0x11/0x40
       ? fixup_exception+0x36/0x460
       ? _raw_spin_unlock_irqrestore+0x11/0x40
       ? exc_page_fault+0x5e/0xc0
       ? asm_exc_page_fault+0x26/0x30
       ? xfrm_update_ae_params+0xd1/0x260
       ? memcpy_orig+0xad/0x140
       ? __pfx__raw_spin_lock_bh+0x10/0x10
       xfrm_update_ae_params+0xe7/0x260
       xfrm_new_ae+0x298/0x4e0
       ? __pfx_xfrm_new_ae+0x10/0x10
       ? __pfx_xfrm_new_ae+0x10/0x10
       xfrm_user_rcv_msg+0x25a/0x410
       ? __pfx_xfrm_user_rcv_msg+0x10/0x10
       ? __alloc_skb+0xcf/0x210
       ? stack_trace_save+0x90/0xd0
       ? filter_irq_stacks+0x1c/0x70
       ? __stack_depot_save+0x39/0x4e0
       ? __kasan_slab_free+0x10a/0x190
       ? kmem_cache_free+0x9c/0x340
       ? netlink_recvmsg+0x23c/0x660
       ? sock_recvmsg+0xeb/0xf0
       ? __sys_recvfrom+0x13c/0x1f0
       ? __x64_sys_recvfrom+0x71/0x90
       ? do_syscall_64+0x3f/0x90
       ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
       ? copyout+0x3e/0x50
       netlink_rcv_skb+0xd6/0x210
       ? __pfx_xfrm_user_rcv_msg+0x10/0x10
       ? __pfx_netlink_rcv_skb+0x10/0x10
       ? __pfx_sock_has_perm+0x10/0x10
       ? mutex_lock+0x8d/0xe0
       ? __pfx_mutex_lock+0x10/0x10
       xfrm_netlink_rcv+0x44/0x50
       netlink_unicast+0x36f/0x4c0
       ? __pfx_netlink_unicast+0x10/0x10
       ? netlink_recvmsg+0x500/0x660
       netlink_sendmsg+0x3b7/0x700
      
      This Null-ptr-deref bug is assigned CVE-2023-3772. And this commit
      adds additional NULL check in xfrm_update_ae_params to fix the NPD.
      
      Fixes: d8647b79
      
       ("xfrm: Add user interface for esn and big anti-replay windows")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53df4be4
    • Zhengchao Shao's avatar
      ip_vti: fix potential slab-use-after-free in decode_session6 · 78e397a4
      Zhengchao Shao authored
      [ Upstream commit 6018a266 ]
      
      When ip_vti device is set to the qdisc of the sfb type, the cb field
      of the sent skb may be modified during enqueuing. Then,
      slab-use-after-free may occur when ip_vti device sends IPv6 packets.
      As commit f8556919 ("xfrm6: Fix the nexthdr offset in
      _decode_session6.") showed, xfrm_decode_session was originally intended
      only for the receive path. IP6CB(skb)->nhoff is not set during
      transmission. Therefore, set the cb field in the skb to 0 before
      sending packets.
      
      Fixes: f8556919
      
       ("xfrm6: Fix the nexthdr offset in _decode_session6.")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      78e397a4
    • Zhengchao Shao's avatar
      ip6_vti: fix slab-use-after-free in decode_session6 · c070688b
      Zhengchao Shao authored
      [ Upstream commit 9fd41f1b ]
      
      When ipv6_vti device is set to the qdisc of the sfb type, the cb field
      of the sent skb may be modified during enqueuing. Then,
      slab-use-after-free may occur when ipv6_vti device sends IPv6 packets.
      
      The stack information is as follows:
      BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
      Read of size 1 at addr ffff88802e08edc2 by task swapper/0/0
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-next-20230707-00001-g84e2cad7f979 #410
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
      Call Trace:
      <IRQ>
      dump_stack_lvl+0xd9/0x150
      print_address_description.constprop.0+0x2c/0x3c0
      kasan_report+0x11d/0x130
      decode_session6+0x103f/0x1890
      __xfrm_decode_session+0x54/0xb0
      vti6_tnl_xmit+0x3e6/0x1ee0
      dev_hard_start_xmit+0x187/0x700
      sch_direct_xmit+0x1a3/0xc30
      __qdisc_run+0x510/0x17a0
      __dev_queue_xmit+0x2215/0x3b10
      neigh_connected_output+0x3c2/0x550
      ip6_finish_output2+0x55a/0x1550
      ip6_finish_output+0x6b9/0x1270
      ip6_output+0x1f1/0x540
      ndisc_send_skb+0xa63/0x1890
      ndisc_send_rs+0x132/0x6f0
      addrconf_rs_timer+0x3f1/0x870
      call_timer_fn+0x1a0/0x580
      expire_timers+0x29b/0x4b0
      run_timer_softirq+0x326/0x910
      __do_softirq+0x1d4/0x905
      irq_exit_rcu+0xb7/0x120
      sysvec_apic_timer_interrupt+0x97/0xc0
      </IRQ>
      Allocated by task 9176:
      kasan_save_stack+0x22/0x40
      kasan_set_track+0x25/0x30
      __kasan_slab_alloc+0x7f/0x90
      kmem_cache_alloc_node+0x1cd/0x410
      kmalloc_reserve+0x165/0x270
      __alloc_skb+0x129/0x330
      netlink_sendmsg+0x9b1/0xe30
      sock_sendmsg+0xde/0x190
      ____sys_sendmsg+0x739/0x920
      ___sys_sendmsg+0x110/0x1b0
      __sys_sendmsg+0xf7/0x1c0
      do_syscall_64+0x39/0xb0
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      Freed by task 9176:
      kasan_save_stack+0x22/0x40
      kasan_set_track+0x25/0x30
      kasan_save_free_info+0x2b/0x40
      ____kasan_slab_free+0x160/0x1c0
      slab_free_freelist_hook+0x11b/0x220
      kmem_cache_free+0xf0/0x490
      skb_free_head+0x17f/0x1b0
      skb_release_data+0x59c/0x850
      consume_skb+0xd2/0x170
      netlink_unicast+0x54f/0x7f0
      netlink_sendmsg+0x926/0xe30
      sock_sendmsg+0xde/0x190
      ____sys_sendmsg+0x739/0x920
      ___sys_sendmsg+0x110/0x1b0
      __sys_sendmsg+0xf7/0x1c0
      do_syscall_64+0x39/0xb0
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      The buggy address belongs to the object at ffff88802e08ed00
      which belongs to the cache skbuff_small_head of size 640
      The buggy address is located 194 bytes inside of
      freed 640-byte region [ffff88802e08ed00, ffff88802e08ef80)
      
      As commit f8556919 ("xfrm6: Fix the nexthdr offset in
      _decode_session6.") showed, xfrm_decode_session was originally intended
      only for the receive path. IP6CB(skb)->nhoff is not set during
      transmission. Therefore, set the cb field in the skb to 0 before
      sending packets.
      
      Fixes: f8556919
      
       ("xfrm6: Fix the nexthdr offset in _decode_session6.")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c070688b
    • Zhengchao Shao's avatar
      xfrm: fix slab-use-after-free in decode_session6 · 86f15300
      Zhengchao Shao authored
      [ Upstream commit 53223f2e ]
      
      When the xfrm device is set to the qdisc of the sfb type, the cb field
      of the sent skb may be modified during enqueuing. Then,
      slab-use-after-free may occur when the xfrm device sends IPv6 packets.
      
      The stack information is as follows:
      BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
      Read of size 1 at addr ffff8881111458ef by task swapper/3/0
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
      Call Trace:
      <IRQ>
      dump_stack_lvl+0xd9/0x150
      print_address_description.constprop.0+0x2c/0x3c0
      kasan_report+0x11d/0x130
      decode_session6+0x103f/0x1890
      __xfrm_decode_session+0x54/0xb0
      xfrmi_xmit+0x173/0x1ca0
      dev_hard_start_xmit+0x187/0x700
      sch_direct_xmit+0x1a3/0xc30
      __qdisc_run+0x510/0x17a0
      __dev_queue_xmit+0x2215/0x3b10
      neigh_connected_output+0x3c2/0x550
      ip6_finish_output2+0x55a/0x1550
      ip6_finish_output+0x6b9/0x1270
      ip6_output+0x1f1/0x540
      ndisc_send_skb+0xa63/0x1890
      ndisc_send_rs+0x132/0x6f0
      addrconf_rs_timer+0x3f1/0x870
      call_timer_fn+0x1a0/0x580
      expire_timers+0x29b/0x4b0
      run_timer_softirq+0x326/0x910
      __do_softirq+0x1d4/0x905
      irq_exit_rcu+0xb7/0x120
      sysvec_apic_timer_interrupt+0x97/0xc0
      </IRQ>
      <TASK>
      asm_sysvec_apic_timer_interrupt+0x1a/0x20
      RIP: 0010:intel_idle_hlt+0x23/0x30
      Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4
      RSP: 0018:ffffc90000197d78 EFLAGS: 00000246
      RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5
      RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50
      RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d
      R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001
      R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000
      cpuidle_enter_state+0xd3/0x6f0
      cpuidle_enter+0x4e/0xa0
      do_idle+0x2fe/0x3c0
      cpu_startup_entry+0x18/0x20
      start_secondary+0x200/0x290
      secondary_startup_64_no_verify+0x167/0x16b
      </TASK>
      Allocated by task 939:
      kasan_save_stack+0x22/0x40
      kasan_set_track+0x25/0x30
      __kasan_slab_alloc+0x7f/0x90
      kmem_cache_alloc_node+0x1cd/0x410
      kmalloc_reserve+0x165/0x270
      __alloc_skb+0x129/0x330
      inet6_ifa_notify+0x118/0x230
      __ipv6_ifa_notify+0x177/0xbe0
      addrconf_dad_completed+0x133/0xe00
      addrconf_dad_work+0x764/0x1390
      process_one_work+0xa32/0x16f0
      worker_thread+0x67d/0x10c0
      kthread+0x344/0x440
      ret_from_fork+0x1f/0x30
      The buggy address belongs to the object at ffff888111145800
      which belongs to the cache skbuff_small_head of size 640
      The buggy address is located 239 bytes inside of
      freed 640-byte region [ffff888111145800, ffff888111145a80)
      
      As commit f8556919 ("xfrm6: Fix the nexthdr offset in
      _decode_session6.") showed, xfrm_decode_session was originally intended
      only for the receive path. IP6CB(skb)->nhoff is not set during
      transmission. Therefore, set the cb field in the skb to 0 before
      sending packets.
      
      Fixes: f8556919
      
       ("xfrm6: Fix the nexthdr offset in _decode_session6.")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      86f15300
    • Herbert Xu's avatar
      xfrm: Silence warnings triggerable by bad packets · 21a3a70c
      Herbert Xu authored
      [ Upstream commit 57010b8e
      
       ]
      
      After the elimination of inner modes, a couple of warnings that
      were previously unreachable can now be triggered by malformed
      inbound packets.
      
      Fix this by:
      
      1. Moving the setting of skb->protocol into the decap functions.
      2. Returning -EINVAL when unexpected protocol is seen.
      
      Reported-by: default avatarMaciej <Żenczykowski&lt;maze@google.com>
      Fixes: 5f24f41e
      
       ("xfrm: Remove inner/outer modes from input path")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Reviewed-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      21a3a70c
    • Lin Ma's avatar
      net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure · 6d1e6152
      Lin Ma authored
      [ Upstream commit d1e0e61d ]
      
      According to all consumers code of attrs[XFRMA_SEC_CTX], like
      
      * verify_sec_ctx_len(), convert to xfrm_user_sec_ctx*
      * xfrm_state_construct(), call security_xfrm_state_alloc whose prototype
      is int security_xfrm_state_alloc(.., struct xfrm_user_sec_ctx *sec_ctx);
      * copy_from_user_sec_ctx(), convert to xfrm_user_sec_ctx *
      ...
      
      It seems that the expected parsing result for XFRMA_SEC_CTX should be
      structure xfrm_user_sec_ctx, and the current xfrm_sec_ctx is confusing
      and misleading (Luckily, they happen to have same size 8 bytes).
      
      This commit amend the policy structure to xfrm_user_sec_ctx to avoid
      ambiguity.
      
      Fixes: cf5cb79f
      
       ("[XFRM] netlink: Establish an attribute policy")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d1e6152
    • Lin Ma's avatar
      net: af_key: fix sadb_x_filter validation · 66e1cd1b
      Lin Ma authored
      [ Upstream commit 75065a89 ]
      
      When running xfrm_state_walk_init(), the xfrm_address_filter being used
      is okay to have a splen/dplen that equals to sizeof(xfrm_address_t)<<3.
      This commit replaces >= to > to make sure the boundary checking is
      correct.
      
      Fixes: 37bd2242
      
       ("af_key: pfkey_dump needs parameter validation")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      66e1cd1b
    • Lin Ma's avatar
      net: xfrm: Fix xfrm_address_filter OOB read · 5713c7ca
      Lin Ma authored
      [ Upstream commit dfa73c17 ]
      
      We found below OOB crash:
      
      [   44.211730] ==================================================================
      [   44.212045] BUG: KASAN: slab-out-of-bounds in memcmp+0x8b/0xb0
      [   44.212045] Read of size 8 at addr ffff88800870f320 by task poc.xfrm/97
      [   44.212045]
      [   44.212045] CPU: 0 PID: 97 Comm: poc.xfrm Not tainted 6.4.0-rc7-00072-gdad9774deaf1-dirty #4
      [   44.212045] Call Trace:
      [   44.212045]  <TASK>
      [   44.212045]  dump_stack_lvl+0x37/0x50
      [   44.212045]  print_report+0xcc/0x620
      [   44.212045]  ? __virt_addr_valid+0xf3/0x170
      [   44.212045]  ? memcmp+0x8b/0xb0
      [   44.212045]  kasan_report+0xb2/0xe0
      [   44.212045]  ? memcmp+0x8b/0xb0
      [   44.212045]  kasan_check_range+0x39/0x1c0
      [   44.212045]  memcmp+0x8b/0xb0
      [   44.212045]  xfrm_state_walk+0x21c/0x420
      [   44.212045]  ? __pfx_dump_one_state+0x10/0x10
      [   44.212045]  xfrm_dump_sa+0x1e2/0x290
      [   44.212045]  ? __pfx_xfrm_dump_sa+0x10/0x10
      [   44.212045]  ? __kernel_text_address+0xd/0x40
      [   44.212045]  ? kasan_unpoison+0x27/0x60
      [   44.212045]  ? mutex_lock+0x60/0xe0
      [   44.212045]  ? __pfx_mutex_lock+0x10/0x10
      [   44.212045]  ? kasan_save_stack+0x22/0x50
      [   44.212045]  netlink_dump+0x322/0x6c0
      [   44.212045]  ? __pfx_netlink_dump+0x10/0x10
      [   44.212045]  ? mutex_unlock+0x7f/0xd0
      [   44.212045]  ? __pfx_mutex_unlock+0x10/0x10
      [   44.212045]  __netlink_dump_start+0x353/0x430
      [   44.212045]  xfrm_user_rcv_msg+0x3a4/0x410
      [   44.212045]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
      [   44.212045]  ? __pfx_xfrm_user_rcv_msg+0x10/0x10
      [   44.212045]  ? __pfx_xfrm_dump_sa+0x10/0x10
      [   44.212045]  ? __pfx_xfrm_dump_sa_done+0x10/0x10
      [   44.212045]  ? __stack_depot_save+0x382/0x4e0
      [   44.212045]  ? filter_irq_stacks+0x1c/0x70
      [   44.212045]  ? kasan_save_stack+0x32/0x50
      [   44.212045]  ? kasan_save_stack+0x22/0x50
      [   44.212045]  ? kasan_set_track+0x25/0x30
      [   44.212045]  ? __kasan_slab_alloc+0x59/0x70
      [   44.212045]  ? kmem_cache_alloc_node+0xf7/0x260
      [   44.212045]  ? kmalloc_reserve+0xab/0x120
      [   44.212045]  ? __alloc_skb+0xcf/0x210
      [   44.212045]  ? netlink_sendmsg+0x509/0x700
      [   44.212045]  ? sock_sendmsg+0xde/0xe0
      [   44.212045]  ? __sys_sendto+0x18d/0x230
      [   44.212045]  ? __x64_sys_sendto+0x71/0x90
      [   44.212045]  ? do_syscall_64+0x3f/0x90
      [   44.212045]  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045]  ? netlink_sendmsg+0x509/0x700
      [   44.212045]  ? sock_sendmsg+0xde/0xe0
      [   44.212045]  ? __sys_sendto+0x18d/0x230
      [   44.212045]  ? __x64_sys_sendto+0x71/0x90
      [   44.212045]  ? do_syscall_64+0x3f/0x90
      [   44.212045]  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045]  ? kasan_save_stack+0x22/0x50
      [   44.212045]  ? kasan_set_track+0x25/0x30
      [   44.212045]  ? kasan_save_free_info+0x2e/0x50
      [   44.212045]  ? __kasan_slab_free+0x10a/0x190
      [   44.212045]  ? kmem_cache_free+0x9c/0x340
      [   44.212045]  ? netlink_recvmsg+0x23c/0x660
      [   44.212045]  ? sock_recvmsg+0xeb/0xf0
      [   44.212045]  ? __sys_recvfrom+0x13c/0x1f0
      [   44.212045]  ? __x64_sys_recvfrom+0x71/0x90
      [   44.212045]  ? do_syscall_64+0x3f/0x90
      [   44.212045]  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045]  ? copyout+0x3e/0x50
      [   44.212045]  netlink_rcv_skb+0xd6/0x210
      [   44.212045]  ? __pfx_xfrm_user_rcv_msg+0x10/0x10
      [   44.212045]  ? __pfx_netlink_rcv_skb+0x10/0x10
      [   44.212045]  ? __pfx_sock_has_perm+0x10/0x10
      [   44.212045]  ? mutex_lock+0x8d/0xe0
      [   44.212045]  ? __pfx_mutex_lock+0x10/0x10
      [   44.212045]  xfrm_netlink_rcv+0x44/0x50
      [   44.212045]  netlink_unicast+0x36f/0x4c0
      [   44.212045]  ? __pfx_netlink_unicast+0x10/0x10
      [   44.212045]  ? netlink_recvmsg+0x500/0x660
      [   44.212045]  netlink_sendmsg+0x3b7/0x700
      [   44.212045]  ? __pfx_netlink_sendmsg+0x10/0x10
      [   44.212045]  ? __pfx_netlink_sendmsg+0x10/0x10
      [   44.212045]  sock_sendmsg+0xde/0xe0
      [   44.212045]  __sys_sendto+0x18d/0x230
      [   44.212045]  ? __pfx___sys_sendto+0x10/0x10
      [   44.212045]  ? rcu_core+0x44a/0xe10
      [   44.212045]  ? __rseq_handle_notify_resume+0x45b/0x740
      [   44.212045]  ? _raw_spin_lock_irq+0x81/0xe0
      [   44.212045]  ? __pfx___rseq_handle_notify_resume+0x10/0x10
      [   44.212045]  ? __pfx_restore_fpregs_from_fpstate+0x10/0x10
      [   44.212045]  ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
      [   44.212045]  ? __pfx_task_work_run+0x10/0x10
      [   44.212045]  __x64_sys_sendto+0x71/0x90
      [   44.212045]  do_syscall_64+0x3f/0x90
      [   44.212045]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045] RIP: 0033:0x44b7da
      [   44.212045] RSP: 002b:00007ffdc8838548 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   44.212045] RAX: ffffffffffffffda RBX: 00007ffdc8839978 RCX: 000000000044b7da
      [   44.212045] RDX: 0000000000000038 RSI: 00007ffdc8838770 RDI: 0000000000000003
      [   44.212045] RBP: 00007ffdc88385b0 R08: 00007ffdc883858c R09: 000000000000000c
      [   44.212045] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
      [   44.212045] R13: 00007ffdc8839968 R14: 00000000004c37d0 R15: 0000000000000001
      [   44.212045]  </TASK>
      [   44.212045]
      [   44.212045] Allocated by task 97:
      [   44.212045]  kasan_save_stack+0x22/0x50
      [   44.212045]  kasan_set_track+0x25/0x30
      [   44.212045]  __kasan_kmalloc+0x7f/0x90
      [   44.212045]  __kmalloc_node_track_caller+0x5b/0x140
      [   44.212045]  kmemdup+0x21/0x50
      [   44.212045]  xfrm_dump_sa+0x17d/0x290
      [   44.212045]  netlink_dump+0x322/0x6c0
      [   44.212045]  __netlink_dump_start+0x353/0x430
      [   44.212045]  xfrm_user_rcv_msg+0x3a4/0x410
      [   44.212045]  netlink_rcv_skb+0xd6/0x210
      [   44.212045]  xfrm_netlink_rcv+0x44/0x50
      [   44.212045]  netlink_unicast+0x36f/0x4c0
      [   44.212045]  netlink_sendmsg+0x3b7/0x700
      [   44.212045]  sock_sendmsg+0xde/0xe0
      [   44.212045]  __sys_sendto+0x18d/0x230
      [   44.212045]  __x64_sys_sendto+0x71/0x90
      [   44.212045]  do_syscall_64+0x3f/0x90
      [   44.212045]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [   44.212045]
      [   44.212045] The buggy address belongs to the object at ffff88800870f300
      [   44.212045]  which belongs to the cache kmalloc-64 of size 64
      [   44.212045] The buggy address is located 32 bytes inside of
      [   44.212045]  allocated 36-byte region [ffff88800870f300, ffff88800870f324)
      [   44.212045]
      [   44.212045] The buggy address belongs to the physical page:
      [   44.212045] page:00000000e4de16ee refcount:1 mapcount:0 mapping:000000000 ...
      [   44.212045] flags: 0x100000000000200(slab|node=0|zone=1)
      [   44.212045] page_type: 0xffffffff()
      [   44.212045] raw: 0100000000000200 ffff888004c41640 dead000000000122 0000000000000000
      [   44.212045] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
      [   44.212045] page dumped because: kasan: bad access detected
      [   44.212045]
      [   44.212045] Memory state around the buggy address:
      [   44.212045]  ffff88800870f200: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [   44.212045]  ffff88800870f280: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      [   44.212045] >ffff88800870f300: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
      [   44.212045]                                ^
      [   44.212045]  ffff88800870f380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   44.212045]  ffff88800870f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   44.212045] ==================================================================
      
      By investigating the code, we find the root cause of this OOB is the lack
      of checks in xfrm_dump_sa(). The buggy code allows a malicious user to pass
      arbitrary value of filter->splen/dplen. Hence, with crafted xfrm states,
      the attacker can achieve 8 bytes heap OOB read, which causes info leak.
      
        if (attrs[XFRMA_ADDRESS_FILTER]) {
          filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]),
              sizeof(*filter), GFP_KERNEL);
          if (filter == NULL)
            return -ENOMEM;
          // NO MORE CHECKS HERE !!!
        }
      
      This patch fixes the OOB by adding necessary boundary checks, just like
      the code in pfkey_dump() function.
      
      Fixes: d3623099
      
       ("ipsec: add support of limited SA dump")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5713c7ca
    • Borislav Petkov (AMD)'s avatar
      x86/srso: Correct the mitigation status when SMT is disabled · e854497f
      Borislav Petkov (AMD) authored
      commit 6405b72e upstream.
      
      Specify how is SRSO mitigated when SMT is disabled. Also, correct the
      SMT check for that.
      
      Fixes: e9fbc47b
      
       ("x86/srso: Disable the mitigation on unaffected configurations")
      Suggested-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Link: https://lore.kernel.org/r/20230814200813.p5czl47zssuej7nv@treble
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e854497f
    • Petr Pavlu's avatar
      x86/retpoline,kprobes: Skip optprobe check for indirect jumps with retpolines and IBT · dc4d07dd
      Petr Pavlu authored
      commit 833fd800 upstream.
      
      The kprobes optimization check can_optimize() calls
      insn_is_indirect_jump() to detect indirect jump instructions in
      a target function. If any is found, creating an optprobe is disallowed
      in the function because the jump could be from a jump table and could
      potentially land in the middle of the target optprobe.
      
      With retpolines, insn_is_indirect_jump() additionally looks for calls to
      indirect thunks which the compiler potentially used to replace original
      jumps. This extra check is however unnecessary because jump tables are
      disabled when the kernel is built with retpolines. The same is currently
      the case with IBT.
      
      Based on this observation, remove the logic to look for calls to
      indirect thunks and skip the check for indirect jumps altogether if the
      kernel is built with retpolines or IBT. Remove subsequently the symbols
      __indirect_thunk_start and __indirect_thunk_end which are no longer
      needed.
      
      Dropping this logic indirectly fixes a problem where the range
      [__indirect_thunk_start, __indirect_thunk_end] wrongly included also the
      return thunk. It caused that machines which used the return thunk as
      a mitigation and didn't have it patched by any alternative ended up not
      being able to use optprobes in any regular function.
      
      Fixes: 0b53c374
      
       ("x86/retpoline: Use -mfunction-return")
      Suggested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Suggested-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20230711091952.27944-3-petr.pavlu@suse.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc4d07dd
    • Petr Pavlu's avatar
      x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG · aadb82bb
      Petr Pavlu authored
      commit 79cd2a11 upstream.
      
      The linker script arch/x86/kernel/vmlinux.lds.S matches the thunk
      sections ".text.__x86.*" from arch/x86/lib/retpoline.S as follows:
      
        .text {
          [...]
          TEXT_TEXT
          [...]
          __indirect_thunk_start = .;
          *(.text.__x86.*)
          __indirect_thunk_end = .;
          [...]
        }
      
      Macro TEXT_TEXT references TEXT_MAIN which normally expands to only
      ".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes
      ".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk
      sections. The output layout is then different than expected. For
      instance, the currently defined range [__indirect_thunk_start,
      __indirect_thunk_end] becomes empty.
      
      Prevent the problem by using ".." as the first separator, for example,
      ".text..__x86.indirect_thunk". This pattern is utilized by other
      explicit section names which start with one of the standard prefixes,
      such as ".text" or ".data", and that need to be individually selected in
      the linker script.
      
        [ nathan: Fix conflicts with SRSO and fold in fix issue brought up by
          Andrew Cooper in post-review:
          https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ]
      
      Fixes: dc5723b0
      
       ("kbuild: add support for Clang LTO")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aadb82bb
    • Borislav Petkov (AMD)'s avatar
      x86/srso: Disable the mitigation on unaffected configurations · 51fc0a88
      Borislav Petkov (AMD) authored
      commit e9fbc47b upstream.
      
      Skip the srso cmd line parsing which is not needed on Zen1/2 with SMT
      disabled and with the proper microcode applied (latter should be the
      case anyway) as those are not affected.
      
      Fixes: 5a15d834
      
       ("x86/srso: Tie SBPB bit setting to microcode patch detection")
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230813104517.3346-1-bp@alien8.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51fc0a88
    • Borislav Petkov (AMD)'s avatar
      x86/CPU/AMD: Fix the DIV(0) initial fix attempt · 1251b96d
      Borislav Petkov (AMD) authored
      commit f58d6fbc upstream.
      
      Initially, it was thought that doing an innocuous division in the #DE
      handler would take care to prevent any leaking of old data from the
      divider but by the time the fault is raised, the speculation has already
      advanced too far and such data could already have been used by younger
      operations.
      
      Therefore, do the innocuous division on every exit to userspace so that
      userspace doesn't see any potentially old data from integer divisions in
      kernel space.
      
      Do the same before VMRUN too, to protect host data from leaking into the
      guest too.
      
      Fixes: 77245f1c
      
       ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@kernel.org>
      Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1251b96d
    • Sean Christopherson's avatar
      x86/retpoline: Don't clobber RFLAGS during srso_safe_ret() · 48a558fb
      Sean Christopherson authored
      commit ba5ca5e5 upstream.
      
      Use LEA instead of ADD when adjusting %rsp in srso_safe_ret{,_alias}()
      so as to avoid clobbering flags.  Drop one of the INT3 instructions to
      account for the LEA consuming one more byte than the ADD.
      
      KVM's emulator makes indirect calls into a jump table of sorts, where
      the destination of each call is a small blob of code that performs fast
      emulation by executing the target instruction with fixed operands.
      
      E.g. to emulate ADC, fastop() invokes adcb_al_dl():
      
        adcb_al_dl:
          <+0>:  adc    %dl,%al
          <+2>:  jmp    <__x86_return_thunk>
      
      A major motivation for doing fast emulation is to leverage the CPU to
      handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
      both an input and output to the target of the call.  fastop() collects
      the RFLAGS result by pushing RFLAGS onto the stack and popping them back
      into a variable (held in %rdi in this case):
      
        asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
      
        <+71>: mov    0xc0(%r8),%rdx
        <+78>: mov    0x100(%r8),%rcx
        <+85>: push   %rdi
        <+86>: popf
        <+87>: call   *%rsi
        <+89>: nop
        <+90>: nop
        <+91>: nop
        <+92>: pushf
        <+93>: pop    %rdi
      
      and then propagating the arithmetic flags into the vCPU's emulator state:
      
        ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
      
        <+64>:  and    $0xfffffffffffff72a,%r9
        <+94>:  and    $0x8d5,%edi
        <+109>: or     %rdi,%r9
        <+122>: mov    %r9,0x10(%r8)
      
      The failures can be most easily reproduced by running the "emulator"
      test in KVM-Unit-Tests.
      
      If you're feeling a bit of deja vu, see commit b63f20a7
      ("x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386").
      
      In addition, this breaks booting of clang-compiled guest on
      a gcc-compiled host where the host contains the %rsp-modifying SRSO
      mitigations.
      
        [ bp: Massage commit message, extend, remove addresses. ]
      
      Fixes: fb3bd914
      
       ("x86/srso: Add a Speculative RAS Overflow mitigation")
      Closes: https://lore.kernel.org/all/de474347-122d-54cd-eabf-9dcc95ab9eae@amd.com
      Reported-by: default avatarSrikanth Aithal <sraithal@amd.com>
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/20230810013334.GA5354@dev-arch.thelio-3990X/
      Link: https://lore.kernel.org/r/20230811155255.250835-1-seanjc@google.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48a558fb
    • Peter Zijlstra's avatar
      x86/static_call: Fix __static_call_fixup() · 92588f22
      Peter Zijlstra authored
      commit 54097309 upstream.
      
      Christian reported spurious module load crashes after some of Song's
      module memory layout patches.
      
      Turns out that if the very last instruction on the very last page of the
      module is a 'JMP __x86_return_thunk' then __static_call_fixup() will
      trip a fault and die.
      
      And while the module rework made this slightly more likely to happen,
      it's always been possible.
      
      Fixes: ee88d363
      
       ("x86,static_call: Use alternative RET encoding")
      Reported-by: default avatarChristian Bricart <christian@bricart.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Link: https://lkml.kernel.org/r/20230816104419.GA982867@hirez.programming.kicks-ass.net
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92588f22
    • Peter Zijlstra's avatar
      objtool/x86: Fixup frame-pointer vs rethunk · 37e6d850
      Peter Zijlstra authored
      commit dbf46008 upstream.
      
      For stack-validation of a frame-pointer build, objtool validates that
      every CALL instruction is preceded by a frame-setup. The new SRSO
      return thunks violate this with their RSB stuffing trickery.
      
      Extend the __fentry__ exception to also cover the embedded_insn case
      used for this. This cures:
      
        vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
      
      Fixes: 4ae68b26
      
       ("objtool/x86: Fix SRSO mess")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-ass.net
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37e6d850
    • Borislav Petkov (AMD)'s avatar
      x86/srso: Explain the untraining sequences a bit more · c70e2efa
      Borislav Petkov (AMD) authored
      commit 9dbd23e4
      
       upstream.
      
      The goal is to eventually have a proper documentation about all this.
      
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814164447.GFZNpZ/64H4lENIe94@fat_crate.local
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c70e2efa
    • Peter Zijlstra's avatar
      x86/cpu/kvm: Provide UNTRAIN_RET_VM · 04103096
      Peter Zijlstra authored
      commit 864bcaa3 upstream.
      
      Similar to how it doesn't make sense to have UNTRAIN_RET have two
      untrain calls, it also doesn't make sense for VMEXIT to have an extra
      IBPB call.
      
      This cures VMEXIT doing potentially unret+IBPB or double IBPB.
      Also, the (SEV) VMEXIT case seems to have been overlooked.
      
      Redefine the meaning of the synthetic IBPB flags to:
      
       - ENTRY_IBPB     -- issue IBPB on entry  (was: entry + VMEXIT)
       - IBPB_ON_VMEXIT -- issue IBPB on VMEXIT
      
      And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains
      the previous behaviour and issues IBPB on entry+VMEXIT.
      
      The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT.
      
      Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that
      check IBPB_ON_VMEXIT.
      
      All this avoids having the VMEXIT case having to check both ENTRY_IBPB
      and IBPB_ON_VMEXIT and simplifies the alternatives.
      
      Fixes: fb3bd914
      
       ("x86/srso: Add a Speculative RAS Overflow mitigation")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04103096
    • Peter Zijlstra's avatar
      x86/cpu: Cleanup the untrain mess · 9588fd88
      Peter Zijlstra authored
      commit e7c25c44 upstream.
      
      Since there can only be one active return_thunk, there only needs be
      one (matching) untrain_ret. It fundamentally doesn't make sense to
      allow multiple untrain_ret at the same time.
      
      Fold all the 3 different untrain methods into a single (temporary)
      helper stub.
      
      Fixes: fb3bd914
      
       ("x86/srso: Add a Speculative RAS Overflow mitigation")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121149.042774962@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9588fd88
    • Peter Zijlstra's avatar
      x86/cpu: Rename srso_(.*)_alias to srso_alias_\1 · ee621ddd
      Peter Zijlstra authored
      commit 42be649d
      
       upstream.
      
      For a more consistent namespace.
      
        [ bp: Fixup names in the doc too. ]
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121148.976236447@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee621ddd
    • Peter Zijlstra's avatar
      x86/cpu: Rename original retbleed methods · 5c510151
      Peter Zijlstra authored
      commit d025b7ba
      
       upstream.
      
      Rename the original retbleed return thunk and untrain_ret to
      retbleed_return_thunk() and retbleed_untrain_ret().
      
      No functional changes.
      
      Suggested-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121148.909378169@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c510151
    • Peter Zijlstra's avatar
      x86/cpu: Clean up SRSO return thunk mess · 4f0d18c2
      Peter Zijlstra authored
      commit d43490d0 upstream.
      
      Use the existing configurable return thunk. There is absolute no
      justification for having created this __x86_return_thunk alternative.
      
      To clarify, the whole thing looks like:
      
      Zen3/4 does:
      
        srso_alias_untrain_ret:
      	  nop2
      	  lfence
      	  jmp srso_alias_return_thunk
      	  int3
      
        srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so
      	  add $8, %rsp
      	  ret
      	  int3
      
        srso_alias_return_thunk:
      	  call srso_alias_safe_ret
      	  ud2
      
      While Zen1/2 does:
      
        srso_untrain_ret:
      	  movabs $foo, %rax
      	  lfence
      	  call srso_safe_ret           (jmp srso_return_thunk ?)
      	  int3
      
        srso_safe_ret: // embedded in movabs instruction
      	  add $8,%rsp
                ret
                int3
      
        srso_return_thunk:
      	  call srso_safe_ret
      	  ud2
      
      While retbleed does:
      
        zen_untrain_ret:
      	  test $0xcc, %bl
      	  lfence
      	  jmp zen_return_thunk
                int3
      
        zen_return_thunk: // embedded in the test instruction
      	  ret
                int3
      
      Where Zen1/2 flush the BTB entry using the instruction decoder trick
      (test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence
      (srso_safe_ret()) which forces the function return instruction to
      speculate into a trap (UD2).  This RET will then mispredict and
      execution will continue at the return site read from the top of the
      stack.
      
      Pick one of three options at boot (evey function can only ever return
      once).
      
        [ bp: Fixup commit message uarch details and add them in a comment in
          the code too. Add a comment about the srso_select_mitigation()
          dependency on retbleed_select_mitigation(). Add moar ifdeffery for
          32-bit builds. Add a dummy srso_untrain_ret_alias() definition for
          32-bit alternatives needing the symbol. ]
      
      Fixes: fb3bd914
      
       ("x86/srso: Add a Speculative RAS Overflow mitigation")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f0d18c2
    • Peter Zijlstra's avatar
      x86/alternative: Make custom return thunk unconditional · 06bcb3da
      Peter Zijlstra authored
      commit 095b8303
      
       upstream.
      
      There is infrastructure to rewrite return thunks to point to any
      random thunk one desires, unwrap that from CALL_THUNKS, which up to
      now was the sole user of that.
      
        [ bp: Make the thunks visible on 32-bit and add ifdeffery for the
          32-bit builds. ]
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121148.775293785@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06bcb3da
    • Peter Zijlstra's avatar
      objtool/x86: Fix SRSO mess · 2d4d8761
      Peter Zijlstra authored
      commit 4ae68b26 upstream.
      
      Objtool --rethunk does two things:
      
       - it collects all (tail) call's of __x86_return_thunk and places them
         into .return_sites. These are typically compiler generated, but
         RET also emits this same.
      
       - it fudges the validation of the __x86_return_thunk symbol; because
         this symbol is inside another instruction, it can't actually find
         the instruction pointed to by the symbol offset and gets upset.
      
      Because these two things pertained to the same symbol, there was no
      pressing need to separate these two separate things.
      
      However, alas, along comes SRSO and more crazy things to deal with
      appeared.
      
      The SRSO patch itself added the following symbol names to identify as
      rethunk:
      
        'srso_untrain_ret', 'srso_safe_ret' and '__ret'
      
      Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a
      new similarly embedded return thunk, and 'srso_untrain_ret' is
      completely unrelated to anything the above does (and was only included
      because of that INT3 vs UD2 issue fixed previous).
      
      Clear things up by adding a second category for the embedded instruction
      thing.
      
      Fixes: fb3bd914
      
       ("x86/srso: Add a Speculative RAS Overflow mitigation")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121148.704502245@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d4d8761
    • Peter Zijlstra's avatar
      x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk() · 1e7b3334
      Peter Zijlstra authored
      commit af023ef3 upstream.
      
        vmlinux.o: warning: objtool: srso_untrain_ret() falls through to next function __x86_return_skl()
        vmlinux.o: warning: objtool: __x86_return_thunk() falls through to next function __x86_return_skl()
      
      This is because these functions (can) end with CALL, which objtool
      does not consider a terminating instruction. Therefore, replace the
      INT3 instruction (which is a non-fatal trap) with UD2 (which is a
      fatal-trap).
      
      This indicates execution will not continue past this point.
      
      Fixes: fb3bd914
      
       ("x86/srso: Add a Speculative RAS Overflow mitigation")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121148.637802730@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e7b3334
    • Peter Zijlstra's avatar
      x86/cpu: Fix __x86_return_thunk symbol type · 7047af22
      Peter Zijlstra authored
      commit 77f67119 upstream.
      
      Commit
      
        fb3bd914 ("x86/srso: Add a Speculative RAS Overflow mitigation")
      
      reimplemented __x86_return_thunk with a mix of SYM_FUNC_START and
      SYM_CODE_END, this is not a sane combination.
      
      Since nothing should ever actually 'CALL' this, make it consistently
      CODE.
      
      Fixes: fb3bd914
      
       ("x86/srso: Add a Speculative RAS Overflow mitigation")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20230814121148.571027074@infradead.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7047af22
    • Tam Nguyen's avatar
      i2c: designware: Handle invalid SMBus block data response length value · 385f438b
      Tam Nguyen authored
      commit 69f035c4
      
       upstream.
      
      In the I2C_FUNC_SMBUS_BLOCK_DATA case, the invalid length byte value
      (outside of 1-32) of the SMBus block data response from the Slave device
      is not correctly handled by the I2C Designware driver.
      
      In case IC_EMPTYFIFO_HOLD_MASTER_EN==1, which cannot be detected
      from the registers, the Master can be disabled only if the STOP bit
      is set. Without STOP bit set, the Master remains active, holding the bus
      until receiving a block data response length. This hangs the bus and
      is unrecoverable.
      
      Avoid this by issuing another dump read to reach the stop condition when
      an invalid length byte is received.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTam Nguyen <tamnguyenchi@os.amperecomputing.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Link: https://lore.kernel.org/r/20230726080001.337353-3-tamnguyenchi@os.amperecomputing.com
      Reviewed-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      385f438b
    • Quan Nguyen's avatar
      i2c: designware: Correct length byte validation logic · b2ef640f
      Quan Nguyen authored
      commit 49d4db39 upstream.
      
      Commit 0daede80 ("i2c: designware: Convert driver to using regmap API")
      changes the logic to validate the whole 32-bit return value of
      DW_IC_DATA_CMD register instead of 8-bit LSB without reason.
      
      Later, commit f53f15ba ("i2c: designware: Get right data length"),
      introduced partial fix but not enough because the "tmp > 0" still test
      tmp as 32-bit value and is wrong in case the IC_DATA_CMD[11] is set.
      
      Revert the logic to just before commit 0daede80
      ("i2c: designware: Convert driver to using regmap API").
      
      Fixes: f53f15ba ("i2c: designware: Get right data length")
      Fixes: 0daede80
      
       ("i2c: designware: Convert driver to using regmap API")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTam Nguyen <tamnguyenchi@os.amperecomputing.com>
      Signed-off-by: default avatarQuan Nguyen <quan@os.amperecomputing.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Link: https://lore.kernel.org/r/20230726080001.337353-2-tamnguyenchi@os.amperecomputing.com
      Reviewed-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2ef640f
    • Chris Mason's avatar
      btrfs: only subtract from len_to_oe_boundary when it is tracking an extent · c4b460b5
      Chris Mason authored
      commit 09c3717c upstream.
      
      bio_ctrl->len_to_oe_boundary is used to make sure we stay inside a zone
      as we submit bios for writes.  Every time we add a page to the bio, we
      decrement those bytes from len_to_oe_boundary, and then we submit the
      bio if we happen to hit zero.
      
      Most of the time, len_to_oe_boundary gets set to U32_MAX.
      submit_extent_page() adds pages into our bio, and the size of the bio
      ends up limited by:
      
      - Are we contiguous on disk?
      - Does bio_add_page() allow us to stuff more in?
      - is len_to_oe_boundary > 0?
      
      The len_to_oe_boundary math starts with U32_MAX, which isn't page or
      sector aligned, and subtracts from it until it hits zero.  In the
      non-zoned case, the last IO we submit before we hit zero is going to be
      unaligned, triggering BUGs.
      
      This is hard to trigger because bio_add_page() isn't going to make a bio
      of U32_MAX size unless you give it a perfect set of pages and fully
      contiguous extents on disk.  We can hit it pretty reliably while making
      large swapfiles during provisioning because the machine is freshly
      booted, mostly idle, and the disk is freshly formatted.  It's also
      possible to trigger with reads when read_ahead_kb is set to 4GB.
      
      The code has been clean up and shifted around a few times, but this flaw
      has been lurking since the counter was added.  I think the commit
      24e6c808 ("btrfs: simplify main loop in submit_extent_page") ended
      up exposing the bug.
      
      The fix used here is to skip doing math on len_to_oe_boundary unless
      we've changed it from the default U32_MAX value.  bio_add_page() is the
      real limit we want, and there's no reason to do extra math when block
      layer is doing it for us.
      
      Sample reproducer, note you'll need to change the path to the bdi and
      device:
      
        SUBVOL=/btrfs/swapvol
        SWAPFILE=$SUBVOL/swapfile
        SZMB=8192
      
        mkfs.btrfs -f /dev/vdb
        mount /dev/vdb /btrfs
      
        btrfs subvol create $SUBVOL
        chattr +C $SUBVOL
        dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
        sync
      
        echo 4 > /proc/sys/vm/drop_caches
      
        echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb
      
        while true; do
      	  echo 1 > /proc/sys/vm/drop_caches
      	  echo 1 > /proc/sys/vm/drop_caches
      	  dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock
        done
      
      Fixes: 24e6c808
      
       ("btrfs: simplify main loop in submit_extent_page")
      CC: stable@vger.kernel.org # 6.4+
      Reviewed-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4b460b5
    • Anand Jain's avatar
      btrfs: fix replace/scrub failure with metadata_uuid · 8add2a96
      Anand Jain authored
      commit b471965f upstream.
      
      Fstests with POST_MKFS_CMD="btrfstune -m" (as in the mailing list)
      reported a few of the test cases failing.
      
      The failure scenario can be summarized and simplified as follows:
      
        $ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb1 /dev/sdb2 :0
        $ btrfstune -m /dev/sdb1 :0
        $ wipefs -a /dev/sdb1 :0
        $ mount -o degraded /dev/sdb2 /btrfs :0
        $ btrfs replace start -B -f -r 1 /dev/sdb1 /btrfs :1
          STDERR:
          ERROR: ioctl(DEV_REPLACE_START) failed on "/btrfs": Input/output error
      
        [11290.583502] BTRFS warning (device sdb2): tree block 22036480 mirror 2 has bad fsid, has 99835c32-49f0-4668-9e66-dc277a96b4a6 want da40350c-33ac-4872-92a8-4948ed8c04d0
        [11290.586580] BTRFS error (device sdb2): unable to fix up (regular) error at logical 22020096 on dev /dev/sdb8 physical 1048576
      
      As above, the replace is failing because we are verifying the header with
      fs_devices::fsid instead of fs_devices::metadata_uuid, despite the
      metadata_uuid actually being present.
      
      To fix this, use fs_devices::metadata_uuid. We copy fsid into
      fs_devices::metadata_uuid if there is no metadata_uuid, so its fine.
      
      Fixes: a3ddbaeb
      
       ("btrfs: scrub: introduce a helper to verify one metadata block")
      CC: stable@vger.kernel.org # 6.4+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8add2a96
    • xiaoshoukui's avatar
      btrfs: fix BUG_ON condition in btrfs_cancel_balance · ae81329f
      xiaoshoukui authored
      commit 29eefa6d
      
       upstream.
      
      Pausing and canceling balance can race to interrupt balance lead to BUG_ON
      panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance
      does not take this race scenario into account.
      
      However, the race condition has no other side effects. We can fix that.
      
      Reproducing it with panic trace like this:
      
        kernel BUG at fs/btrfs/volumes.c:4618!
        RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0
        Call Trace:
         <TASK>
         ? do_nanosleep+0x60/0x120
         ? hrtimer_nanosleep+0xb7/0x1a0
         ? sched_core_clone_cookie+0x70/0x70
         btrfs_ioctl_balance_ctl+0x55/0x70
         btrfs_ioctl+0xa46/0xd20
         __x64_sys_ioctl+0x7d/0xa0
         do_syscall_64+0x38/0x80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
        Race scenario as follows:
        > mutex_unlock(&fs_info->balance_mutex);
        > --------------------
        > .......issue pause and cancel req in another thread
        > --------------------
        > ret = __btrfs_balance(fs_info);
        >
        > mutex_lock(&fs_info->balance_mutex);
        > if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
        >         btrfs_info(fs_info, "balance: paused");
        >         btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
        > }
      
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarxiaoshoukui <xiaoshoukui@ruijie.com.cn>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae81329f
    • Josef Bacik's avatar
      btrfs: fix incorrect splitting in btrfs_drop_extent_map_range · b43a4c99
      Josef Bacik authored
      commit c962098c upstream.
      
      In production we were seeing a variety of WARN_ON()'s in the extent_map
      code, specifically in btrfs_drop_extent_map_range() when we have to call
      add_extent_mapping() for our second split.
      
      Consider the following extent map layout
      
      	PINNED
      	[0 16K)  [32K, 48K)
      
      and then we call btrfs_drop_extent_map_range for [0, 36K), with
      skip_pinned == true.  The initial loop will have
      
      	start = 0
      	end = 36K
      	len = 36K
      
      we will find the [0, 16k) extent, but since we are pinned we will skip
      it, which has this code
      
      	start = em_end;
      	if (end != (u64)-1)
      		len = start + len - em_end;
      
      em_end here is 16K, so now the values are
      
      	start = 16K
      	len = 16K + 36K - 16K = 36K
      
      len should instead be 20K.  This is a problem when we find the next
      extent at [32K, 48K), we need to split this extent to leave [36K, 48k),
      however the code for the split looks like this
      
      	split->start = start + len;
      	split->len = em_end - (start + len);
      
      In this case we have
      
      	em_end = 48K
      	split->start = 16K + 36K       // this should be 16K + 20K
      	split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K
      
      and now we have an invalid extent_map in the tree that potentially
      overlaps other entries in the extent map.  Even in the non-overlapping
      case we will have split->start set improperly, which will cause problems
      with any block related calculations.
      
      We don't actually need len in this loop, we can simply use end as our
      end point, and only adjust start up when we find a pinned extent we need
      to skip.
      
      Adjust the logic to do this, which keeps us from inserting an invalid
      extent map.
      
      We only skip_pinned in the relocation case, so this is relatively rare,
      except in the case where you are running relocation a lot, which can
      happen with auto relocation on.
      
      Fixes: 55ef6899
      
       ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b43a4c99
    • Filipe Manana's avatar
      btrfs: fix infinite directory reads · 5441532f
      Filipe Manana authored
      commit 9b378f6a
      
       upstream.
      
      The readdir implementation currently processes always up to the last index
      it finds. This however can result in an infinite loop if the directory has
      a large number of entries such that they won't all fit in the given buffer
      passed to the readdir callback, that is, dir_emit() returns a non-zero
      value. Because in that case readdir() will be called again and if in the
      meanwhile new directory entries were added and we still can't put all the
      remaining entries in the buffer, we keep repeating this over and over.
      
      The following C program and test script reproduce the problem:
      
        $ cat /mnt/readdir_prog.c
        #include <sys/types.h>
        #include <dirent.h>
        #include <stdio.h>
      
        int main(int argc, char *argv[])
        {
          DIR *dir = opendir(".");
          struct dirent *dd;
      
          while ((dd = readdir(dir))) {
            printf("%s\n", dd->d_name);
            rename(dd->d_name, "TEMPFILE");
            rename("TEMPFILE", dd->d_name);
          }
          closedir(dir);
        }
      
        $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
      
        $ cat test.sh
        #!/bin/bash
      
        DEV=/dev/sdi
        MNT=/mnt/sdi
      
        mkfs.btrfs -f $DEV &> /dev/null
        #mkfs.xfs -f $DEV &> /dev/null
        #mkfs.ext4 -F $DEV &> /dev/null
      
        mount $DEV $MNT
      
        mkdir $MNT/testdir
        for ((i = 1; i <= 2000; i++)); do
            echo -n > $MNT/testdir/file_$i
        done
      
        cd $MNT/testdir
        /mnt/readdir_prog
      
        cd /mnt
      
        umount $MNT
      
      This behaviour is surprising to applications and it's unlike ext4, xfs,
      tmpfs, vfat and other filesystems, which always finish. In this case where
      new entries were added due to renames, some file names may be reported
      more than once, but this varies according to each filesystem - for example
      ext4 never reported the same file more than once while xfs reports the
      first 13 file names twice.
      
      So change our readdir implementation to track the last index number when
      opendir() is called and then make readdir() never process beyond that
      index number. This gives the same behaviour as ext4.
      
      Reported-by: default avatarRob Landley <rob@landley.net>
      Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
      CC: stable@vger.kernel.org # 6.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5441532f
    • Sherry Sun's avatar
      tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms · c5be9bc0
      Sherry Sun authored
      commit 28206984 upstream.
      
      Do not read the data register to clear the error flags for lpuart32
      platforms, the additional read may cause the receive FIFO underflow
      since the DMA has already read the data register.
      Actually all lpuart32 platforms support write 1 to clear those error
      bits, let's use this method to better clear the error flags.
      
      Fixes: 42b68768
      
       ("serial: fsl_lpuart: DMA support for 32-bit variant")
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarSherry Sun <sherry.sun@nxp.com>
      Link: https://lore.kernel.org/r/20230801022304.24251-1-sherry.sun@nxp.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5be9bc0