Skip to content
  1. Apr 10, 2024
    • Aleksandr Loktionov's avatar
      i40e: fix vf may be used uninitialized in this function warning · 0dcf573f
      Aleksandr Loktionov authored
      commit f37c4eac upstream.
      
      To fix the regression introduced by commit 52424f97, which causes
      servers hang in very hard to reproduce conditions with resets races.
      Using two sources for the information is the root cause.
      In this function before the fix bumping v didn't mean bumping vf
      pointer. But the code used this variables interchangeably, so stale vf
      could point to different/not intended vf.
      
      Remove redundant "v" variable and iterate via single VF pointer across
      whole function instead to guarantee VF pointer validity.
      
      Fixes: 52424f97
      
       ("i40e: Fix VF hang when reset is triggered on another VF")
      Signed-off-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Reviewed-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dcf573f
    • Aleksandr Loktionov's avatar
      i40e: fix i40e_count_filters() to count only active/new filters · 89e29416
      Aleksandr Loktionov authored
      commit eb58c598 upstream.
      
      The bug usually affects untrusted VFs, because they are limited to 18 MACs,
      it affects them badly, not letting to create MAC all filters.
      Not stable to reproduce, it happens when VF user creates MAC filters
      when other MACVLAN operations are happened in parallel.
      But consequence is that VF can't receive desired traffic.
      
      Fix counter to be bumped only for new or active filters.
      
      Fixes: 621650ca
      
       ("i40e: Refactoring VF MAC filters counting to make more reliable")
      Signed-off-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Reviewed-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89e29416
    • Aleksandr Mishin's avatar
      octeontx2-af: Add array index check · 76c39cf8
      Aleksandr Mishin authored
      commit ef15ddee upstream.
      
      In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach
      value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array.
      Fix this bug by adding 'iter' value check.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 91c6945e
      
       ("octeontx2-af: cn10k: Add RPM MAC support")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76c39cf8
    • Su Hui's avatar
      octeontx2-pf: check negative error code in otx2_open() · 43b69da2
      Su Hui authored
      commit e709acbd upstream.
      
      otx2_rxtx_enable() return negative error code such as -EIO,
      check -EIO rather than EIO to fix this problem.
      
      Fixes: c9262522
      
       ("octeontx2-pf: Disable packet I/O for graceful exit")
      Signed-off-by: default avatarSu Hui <suhui@nfschina.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Link: https://lore.kernel.org/r/20240328020620.4054692-1-suhui@nfschina.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43b69da2
    • Hariprasad Kelam's avatar
      octeontx2-af: Fix issue with loading coalesced KPU profiles · b08b0c7a
      Hariprasad Kelam authored
      commit 0ba80d96 upstream.
      
      The current implementation for loading coalesced KPU profiles has
      a limitation.  The "offset" field, which is used to locate profiles
      within the profile is restricted to a u16.
      
      This restricts the number of profiles that can be loaded. This patch
      addresses this limitation by increasing the size of the "offset" field.
      
      Fixes: 11c730bf
      
       ("octeontx2-af: support for coalescing KPU profiles")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b08b0c7a
    • Antoine Tenart's avatar
      udp: prevent local UDP tunnel packets from being GROed · 03b6f369
      Antoine Tenart authored
      commit 64235eab upstream.
      
      GRO has a fundamental issue with UDP tunnel packets as it can't detect
      those in a foolproof way and GRO could happen before they reach the
      tunnel endpoint. Previous commits have fixed issues when UDP tunnel
      packets come from a remote host, but if those packets are issued locally
      they could run into checksum issues.
      
      If the inner packet has a partial checksum the information will be lost
      in the GRO logic, either in udp4/6_gro_complete or in
      udp_gro_complete_segment and packets will have an invalid checksum when
      leaving the host.
      
      Prevent local UDP tunnel packets from ever being GROed at the outer UDP
      level.
      
      Due to skb->encapsulation being wrongly used in some drivers this is
      actually only preventing UDP tunnel packets with a partial checksum to
      be GROed (see iptunnel_handle_offloads) but those were also the packets
      triggering issues so in practice this should be sufficient.
      
      Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.")
      Fixes: 36707061
      
       ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03b6f369
    • Antoine Tenart's avatar
      udp: do not transition UDP GRO fraglist partial checksums to unnecessary · 2a1b61d0
      Antoine Tenart authored
      commit f0b8c303 upstream.
      
      UDP GRO validates checksums and in udp4/6_gro_complete fraglist packets
      are converted to CHECKSUM_UNNECESSARY to avoid later checks. However
      this is an issue for CHECKSUM_PARTIAL packets as they can be looped in
      an egress path and then their partial checksums are not fixed.
      
      Different issues can be observed, from invalid checksum on packets to
      traces like:
      
        gen01: hw csum failure
        skb len=3008 headroom=160 headlen=1376 tailroom=0
        mac=(106,14) net=(120,40) trans=160
        shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
        csum(0xffff232e ip_summed=2 complete_sw=0 valid=0 level=0)
        hash(0x77e3d716 sw=1 l4=1) proto=0x86dd pkttype=0 iif=12
        ...
      
      Fix this by only converting CHECKSUM_NONE packets to
      CHECKSUM_UNNECESSARY by reusing __skb_incr_checksum_unnecessary. All
      other checksum types are kept as-is, including CHECKSUM_COMPLETE as
      fraglist packets being segmented back would have their skb->csum valid.
      
      Fixes: 9fd1ff5d
      
       ("udp: Support UDP fraglist GRO/GSO.")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a1b61d0
    • Antoine Tenart's avatar
      udp: do not accept non-tunnel GSO skbs landing in a tunnel · 3001e7aa
      Antoine Tenart authored
      commit 3d010c80 upstream.
      
      When rx-udp-gro-forwarding is enabled UDP packets might be GROed when
      being forwarded. If such packets might land in a tunnel this can cause
      various issues and udp_gro_receive makes sure this isn't the case by
      looking for a matching socket. This is performed in
      udp4/6_gro_lookup_skb but only in the current netns. This is an issue
      with tunneled packets when the endpoint is in another netns. In such
      cases the packets will be GROed at the UDP level, which leads to various
      issues later on. The same thing can happen with rx-gro-list.
      
      We saw this with geneve packets being GROed at the UDP level. In such
      case gso_size is set; later the packet goes through the geneve rx path,
      the geneve header is pulled, the offset are adjusted and frag_list skbs
      are not adjusted with regard to geneve. When those skbs hit
      skb_fragment, it will misbehave. Different outcomes are possible
      depending on what the GROed skbs look like; from corrupted packets to
      kernel crashes.
      
      One example is a BUG_ON[1] triggered in skb_segment while processing the
      frag_list. Because gso_size is wrong (geneve header was pulled)
      skb_segment thinks there is "geneve header size" of data in frag_list,
      although it's in fact the next packet. The BUG_ON itself has nothing to
      do with the issue. This is only one of the potential issues.
      
      Looking up for a matching socket in udp_gro_receive is fragile: the
      lookup could be extended to all netns (not speaking about performances)
      but nothing prevents those packets from being modified in between and we
      could still not find a matching socket. It's OK to keep the current
      logic there as it should cover most cases but we also need to make sure
      we handle tunnel packets being GROed too early.
      
      This is done by extending the checks in udp_unexpected_gso: GSO packets
      lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must
      be segmented.
      
      [1] kernel BUG at net/core/skbuff.c:4408!
          RIP: 0010:skb_segment+0xd2a/0xf70
          __udp_gso_segment+0xaa/0x560
      
      Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.")
      Fixes: 36707061
      
       ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3001e7aa
    • Atlas Yu's avatar
      r8169: skip DASH fw status checks when DASH is disabled · a5eae74f
      Atlas Yu authored
      commit 5e864d90 upstream.
      
      On devices that support DASH, the current code in the "rtl_loop_wait" function
      raises false alarms when DASH is disabled. This occurs because the function
      attempts to wait for the DASH firmware to be ready, even though it's not
      relevant in this case.
      
      r8169 0000:0c:00.0 eth0: RTL8168ep/8111ep, 38:7c:76:49:08:d9, XID 502, IRQ 86
      r8169 0000:0c:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
      r8169 0000:0c:00.0 eth0: DASH disabled
      ...
      r8169 0000:0c:00.0 eth0: rtl_ep_ocp_read_cond == 0 (loop: 30, delay: 10000).
      
      This patch modifies the driver start/stop functions to skip checking the DASH
      firmware status when DASH is explicitly disabled. This prevents unnecessary
      delays and false alarms.
      
      The patch has been tested on several ThinkStation P8/PX workstations.
      
      Fixes: 0ab0c45d
      
       ("r8169: add handling DASH when DASH is disabled")
      Signed-off-by: default avatarAtlas Yu <atlas.yu@canonical.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/20240328055152.18443-1-atlas.yu@canonical.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5eae74f
    • David Thompson's avatar
      mlxbf_gige: stop interface during shutdown · 36a1cb03
      David Thompson authored
      commit 09ba28e1 upstream.
      
      The mlxbf_gige driver intermittantly encounters a NULL pointer
      exception while the system is shutting down via "reboot" command.
      The mlxbf_driver will experience an exception right after executing
      its shutdown() method.  One example of this exception is:
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
      Mem abort info:
        ESR = 0x0000000096000004
        EC = 0x25: DABT (current EL), IL = 32 bits
        SET = 0, FnV = 0
        EA = 0, S1PTW = 0
        FSC = 0x04: level 0 translation fault
      Data abort info:
        ISV = 0, ISS = 0x00000004
        CM = 0, WnR = 0
      user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000
      [0000000000000070] pgd=0000000000000000, p4d=0000000000000000
      Internal error: Oops: 96000004 [#1] SMP
      CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G S         OE     5.15.0-bf.6.gef6992a #1
      Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12669 Apr 21 2023
      pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
      lr : mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
      sp : ffff8000080d3c10
      x29: ffff8000080d3c10 x28: ffffcce72cbb7000 x27: ffff8000080d3d58
      x26: ffff0000814e7340 x25: ffff331cd1a05000 x24: ffffcce72c4ea008
      x23: ffff0000814e4b40 x22: ffff0000814e4d10 x21: ffff0000814e4128
      x20: 0000000000000000 x19: ffff0000814e4a80 x18: ffffffffffffffff
      x17: 000000000000001c x16: ffffcce72b4553f4 x15: ffff80008805b8a7
      x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
      x11: 7f7f7f7f7f7f7f7f x10: c2ac898b17576267 x9 : ffffcce720fa5404
      x8 : ffff000080812138 x7 : 0000000000002e9a x6 : 0000000000000080
      x5 : ffff00008de3b000 x4 : 0000000000000000 x3 : 0000000000000001
      x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
      Call trace:
       mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
       mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
       __napi_poll+0x40/0x1c8
       net_rx_action+0x314/0x3a0
       __do_softirq+0x128/0x334
       run_ksoftirqd+0x54/0x6c
       smpboot_thread_fn+0x14c/0x190
       kthread+0x10c/0x110
       ret_from_fork+0x10/0x20
      Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002)
      ---[ end trace 7cc3941aa0d8e6a4 ]---
      Kernel panic - not syncing: Oops: Fatal exception in interrupt
      Kernel Offset: 0x4ce722520000 from 0xffff800008000000
      PHYS_OFFSET: 0x80000000
      CPU features: 0x000005c1,a3330e5a
      Memory Limit: none
      ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
      
      During system shutdown, the mlxbf_gige driver's shutdown() is always executed.
      However, the driver's stop() method will only execute if networking interface
      configuration logic within the Linux distribution has been setup to do so.
      
      If shutdown() executes but stop() does not execute, NAPI remains enabled
      and this can lead to an exception if NAPI is scheduled while the hardware
      interface has only been partially deinitialized.
      
      The networking interface managed by the mlxbf_gige driver must be properly
      stopped during system shutdown so that IFF_UP is cleared, the hardware
      interface is put into a clean state, and NAPI is fully deinitialized.
      
      Fixes: f92e1869
      
       ("Add Mellanox BlueField Gigabit Ethernet driver")
      Signed-off-by: default avatarDavid Thompson <davthompson@nvidia.com>
      Link: https://lore.kernel.org/r/20240325210929.25362-1-davthompson@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36a1cb03
    • Kuniyuki Iwashima's avatar
      ipv6: Fix infinite recursion in fib6_dump_done(). · f2dd75e5
      Kuniyuki Iwashima authored
      commit d21d4060 upstream.
      
      syzkaller reported infinite recursive calls of fib6_dump_done() during
      netlink socket destruction.  [1]
      
      From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
      the response was generated.  The following recvmmsg() resumed the dump
      for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due
      to the fault injection.  [0]
      
        12:01:34 executing program 3:
        r0 = socket$nl_route(0x10, 0x3, 0x0)
        sendmsg$nl_route(r0, ... snip ...)
        recvmmsg(r0, ... snip ...) (fail_nth: 8)
      
      Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call
      of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3].  syzkaller stopped
      receiving the response halfway through, and finally netlink_sock_destruct()
      called nlk_sk(sk)->cb.done().
      
      fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it
      is still not NULL.  fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by
      nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling
      itself recursively and hitting the stack guard page.
      
      To avoid the issue, let's set the destructor after kzalloc().
      
      [0]:
      FAULT_INJECTION: forcing a failure.
      name failslab, interval 1, probability 0, space 0, times 0
      CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl (lib/dump_stack.c:117)
       should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
       should_failslab (mm/slub.c:3733)
       kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992)
       inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662)
       rtnl_dump_all (net/core/rtnetlink.c:4029)
       netlink_dump (net/netlink/af_netlink.c:2269)
       netlink_recvmsg (net/netlink/af_netlink.c:1988)
       ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801)
       ___sys_recvmsg (net/socket.c:2846)
       do_recvmmsg (net/socket.c:2943)
       __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)
      
      [1]:
      BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb)
      stack guard page: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Workqueue: events netlink_sock_destruct_work
      RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570)
      Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff
      RSP: 0018:ffffc9000d980000 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3
      RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358
      RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000
      R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68
      FS:  0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0
      PKRU: 55555554
      Call Trace:
       <#DF>
       </#DF>
       <TASK>
       fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
       fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
       ...
       fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
       fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
       netlink_sock_destruct (net/netlink/af_netlink.c:401)
       __sk_destruct (net/core/sock.c:2177 (discriminator 2))
       sk_destruct (net/core/sock.c:2224)
       __sk_free (net/core/sock.c:2235)
       sk_free (net/core/sock.c:2246)
       process_one_work (kernel/workqueue.c:3259)
       worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
       kthread (kernel/kthread.c:388)
       ret_from_fork (arch/x86/kernel/process.c:153)
       ret_from_fork_asm (arch/x86/entry/entry_64.S:256)
      Modules linked in:
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20240401211003.25274-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2dd75e5
    • Duoming Zhou's avatar
      ax25: fix use-after-free bugs caused by ax25_ds_del_timer · 74204bf9
      Duoming Zhou authored
      commit fd819ad3 upstream.
      
      When the ax25 device is detaching, the ax25_dev_device_down()
      calls ax25_ds_del_timer() to cleanup the slave_timer. When
      the timer handler is running, the ax25_ds_del_timer() that
      calls del_timer() in it will return directly. As a result,
      the use-after-free bugs could happen, one of the scenarios
      is shown below:
      
            (Thread 1)          |      (Thread 2)
                                | ax25_ds_timeout()
      ax25_dev_device_down()    |
        ax25_ds_del_timer()     |
          del_timer()           |
        ax25_dev_put() //FREE   |
                                |  ax25_dev-> //USE
      
      In order to mitigate bugs, when the device is detaching, use
      timer_shutdown_sync() to stop the timer.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74204bf9
    • Kuniyuki Iwashima's avatar
      tcp: Fix bind() regression for v6-only wildcard and v4(-mapped-v6) non-wildcard addresses. · 8b88752d
      Kuniyuki Iwashima authored
      commit d91ef1e1 upstream.
      
      Jianguo Wu reported another bind() regression introduced by bhash2.
      
      Calling bind() for the following 3 addresses on the same port, the
      3rd one should fail but now succeeds.
      
        1. 0.0.0.0 or ::ffff:0.0.0.0
        2. [::] w/ IPV6_V6ONLY
        3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
      
      The first two bind() create tb2 like this:
      
        bhash2 -> tb2(:: w/ IPV6_V6ONLY) -> tb2(0.0.0.0)
      
      The 3rd bind() will match with the IPv6 only wildcard address bucket
      in inet_bind2_bucket_match_addr_any(), however, no conflicting socket
      exists in the bucket.  So, inet_bhash2_conflict() will returns false,
      and thus, inet_bhash2_addr_any_conflict() returns false consequently.
      
      As a result, the 3rd bind() bypasses conflict check, which should be
      done against the IPv4 wildcard address bucket.
      
      So, in inet_bhash2_addr_any_conflict(), we must iterate over all buckets.
      
      Note that we cannot add ipv6_only flag for inet_bind2_bucket as it
      would confuse the following patetrn.
      
        1. [::] w/ SO_REUSE{ADDR,PORT} and IPV6_V6ONLY
        2. [::] w/ SO_REUSE{ADDR,PORT}
        3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
      
      The first bind() would create a bucket with ipv6_only flag true,
      the second bind() would add the [::] socket into the same bucket,
      and the third bind() could succeed based on the wrong assumption
      that ipv6_only bucket would not conflict with v4(-mapped-v6) address.
      
      Fixes: 28044fc1
      
       ("net: Add a bhash2 table hashed by port and address")
      Diagnosed-by: default avatarJianguo Wu <wujianguo106@163.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-3-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b88752d
    • Jakub Kicinski's avatar
      selftests: reuseaddr_conflict: add missing new line at the end of the output · 690e877c
      Jakub Kicinski authored
      commit 31974122 upstream.
      
      The netdev CI runs in a VM and captures serial, so stdout and
      stderr get combined. Because there's a missing new line in
      stderr the test ends up corrupting KTAP:
      
        # Successok 1 selftests: net: reuseaddr_conflict
      
      which should have been:
      
        # Success
        ok 1 selftests: net: reuseaddr_conflict
      
      Fixes: 422d8dc6
      
       ("selftest: add a reuseaddr test")
      Reviewed-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      690e877c
    • Eric Dumazet's avatar
      erspan: make sure erspan_base_hdr is present in skb->head · 4e3fdeec
      Eric Dumazet authored
      commit 17af4205 upstream.
      
      syzbot reported a problem in ip6erspan_rcv() [1]
      
      Issue is that ip6erspan_rcv() (and erspan_rcv()) no longer make
      sure erspan_base_hdr is present in skb linear part (skb->head)
      before getting @ver field from it.
      
      Add the missing pskb_may_pull() calls.
      
      v2: Reload iph pointer in erspan_rcv() after pskb_may_pull()
          because skb->head might have changed.
      
      [1]
      
       BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
       BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2756 [inline]
       BUG: KMSAN: uninit-value in ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
       BUG: KMSAN: uninit-value in gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
        pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
        pskb_may_pull include/linux/skbuff.h:2756 [inline]
        ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
        gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
        ip6_protocol_deliver_rcu+0x1d4c/0x2ca0 net/ipv6/ip6_input.c:438
        ip6_input_finish net/ipv6/ip6_input.c:483 [inline]
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492
        ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586
        dst_input include/net/dst.h:460 [inline]
        ip6_rcv_finish+0x955/0x970 net/ipv6/ip6_input.c:79
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ipv6_rcv+0xde/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5652
        netif_receive_skb_internal net/core/dev.c:5738 [inline]
        netif_receive_skb+0x58/0x660 net/core/dev.c:5798
        tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1549
        tun_get_user+0x5566/0x69e0 drivers/net/tun.c:2002
        tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
        call_write_iter include/linux/fs.h:2108 [inline]
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0xb63/0x1520 fs/read_write.c:590
        ksys_write+0x20f/0x4c0 fs/read_write.c:643
        __do_sys_write fs/read_write.c:655 [inline]
        __se_sys_write fs/read_write.c:652 [inline]
        __x64_sys_write+0x93/0xe0 fs/read_write.c:652
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Uninit was created at:
        slab_post_alloc_hook mm/slub.c:3804 [inline]
        slab_alloc_node mm/slub.c:3845 [inline]
        kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
        kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
        __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
        alloc_skb include/linux/skbuff.h:1318 [inline]
        alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
        sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
        tun_alloc_skb drivers/net/tun.c:1525 [inline]
        tun_get_user+0x209a/0x69e0 drivers/net/tun.c:1846
        tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
        call_write_iter include/linux/fs.h:2108 [inline]
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0xb63/0x1520 fs/read_write.c:590
        ksys_write+0x20f/0x4c0 fs/read_write.c:643
        __do_sys_write fs/read_write.c:655 [inline]
        __se_sys_write fs/read_write.c:652 [inline]
        __x64_sys_write+0x93/0xe0 fs/read_write.c:652
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      CPU: 1 PID: 5045 Comm: syz-executor114 Not tainted 6.9.0-rc1-syzkaller-00021-g962490525cff #0
      
      Fixes: cb73ee40
      
       ("net: ip_gre: use erspan key field for tunnel lookup")
      Reported-by: default avatar <syzbot+1c1cf138518bf0c53d68@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/netdev/000000000000772f2c0614b66ef7@google.com/
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Lorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/20240328112248.1101491-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e3fdeec
    • Ivan Vecera's avatar
      i40e: Fix VF MAC filter removal · a03e138d
      Ivan Vecera authored
      commit ea2a1cfc upstream.
      
      Commit 73d9629e ("i40e: Do not allow untrusted VF to remove
      administratively set MAC") fixed an issue where untrusted VF was
      allowed to remove its own MAC address although this was assigned
      administratively from PF. Unfortunately the introduced check
      is wrong because it causes that MAC filters for other MAC addresses
      including multi-cast ones are not removed.
      
      <snip>
      	if (ether_addr_equal(addr, vf->default_lan_addr.addr) &&
      	    i40e_can_vf_change_mac(vf))
      		was_unimac_deleted = true;
      	else
      		continue;
      
      	if (i40e_del_mac_filter(vsi, al->list[i].addr)) {
      	...
      </snip>
      
      The else path with `continue` effectively skips any MAC filter
      removal except one for primary MAC addr when VF is allowed to do so.
      Fix the check condition so the `continue` is only done for primary
      MAC address.
      
      Fixes: 73d9629e
      
       ("i40e: Do not allow untrusted VF to remove administratively set MAC")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20240329180638.211412-1-anthony.l.nguyen@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a03e138d
    • Petr Oros's avatar
      ice: fix enabling RX VLAN filtering · b9bd1498
      Petr Oros authored
      commit 8edfc7a4 upstream.
      
      ice_port_vlan_on/off() was introduced in commit 2946204b ("ice:
      implement bridge port vlan"). But ice_port_vlan_on() incorrectly assigns
      ena_rx_filtering to inner_vlan_ops in DVM mode.
      This causes an error when rx_filtering cannot be enabled in legacy mode.
      
      Reproducer:
       echo 1 > /sys/class/net/$PF/device/sriov_numvfs
       ip link set $PF vf 0 spoofchk off trust on vlan 3
      dmesg:
       ice 0000:41:00.0: failed to enable Rx VLAN filtering for VF 0 VSI 9 during VF rebuild, error -95
      
      Fixes: 2946204b
      
       ("ice: implement bridge port vlan")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9bd1498
    • Antoine Tenart's avatar
      gro: fix ownership transfer · fc126c1d
      Antoine Tenart authored
      commit ed4cccef upstream.
      
      If packets are GROed with fraglist they might be segmented later on and
      continue their journey in the stack. In skb_segment_list those skbs can
      be reused as-is. This is an issue as their destructor was removed in
      skb_gro_receive_list but not the reference to their socket, and then
      they can't be orphaned. Fix this by also removing the reference to the
      socket.
      
      For example this could be observed,
      
        kernel BUG at include/linux/skbuff.h:3131!  (skb_orphan)
        RIP: 0010:ip6_rcv_core+0x11bc/0x19a0
        Call Trace:
         ipv6_list_rcv+0x250/0x3f0
         __netif_receive_skb_list_core+0x49d/0x8f0
         netif_receive_skb_list_internal+0x634/0xd40
         napi_complete_done+0x1d2/0x7d0
         gro_cell_poll+0x118/0x1f0
      
      A similar construction is found in skb_gro_receive, apply the same
      change there.
      
      Fixes: 5e10da53
      
       ("skbuff: allow 'slow_gro' for skb carring sock reference")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc126c1d
    • Antoine Tenart's avatar
      selftests: net: gro fwd: update vxlan GRO test expectations · 39864092
      Antoine Tenart authored
      commit 0fb101be upstream.
      
      UDP tunnel packets can't be GRO in-between their endpoints as this
      causes different issues. The UDP GRO fwd vxlan tests were relying on
      this and their expectations have to be fixed.
      
      We keep both vxlan tests and expected no GRO from happening. The vxlan
      UDP GRO bench test was removed as it's not providing any valuable
      information now.
      
      Fixes: a062260a
      
       ("selftests: net: add UDP GRO forwarding self-tests")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39864092
    • Michael Krummsdorf's avatar
      net: dsa: mv88e6xxx: fix usable ports on 88e6020 · 23e1c686
      Michael Krummsdorf authored
      commit 625aefac upstream.
      
      The switch has 4 ports with 2 internal PHYs, but ports are numbered up
      to 6, with ports 0, 1, 5 and 6 being usable.
      
      Fixes: 71d94a43
      
       ("net: dsa: mv88e6xxx: add support for MV88E6020 switch")
      Signed-off-by: default avatarMichael Krummsdorf <michael.krummsdorf@tq-group.com>
      Signed-off-by: default avatarMatthias Schiffer <matthias.schiffer@ew.tq-group.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240326123655.40666-1-matthias.schiffer@ew.tq-group.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23e1c686
    • Aleksandr Mishin's avatar
      net: phy: micrel: Fix potential null pointer dereference · 95c1016a
      Aleksandr Mishin authored
      commit 96c15594 upstream.
      
      In lan8814_get_sig_rx() and lan8814_get_sig_tx() ptp_parse_header() may
      return NULL as ptp_header due to abnormal packet type or corrupted packet.
      Fix this bug by adding ptp_header check.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: ece19502
      
       ("net: phy: micrel: 1588 support for LAN8814 phy")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20240329061631.33199-1-amishin@t-argos.ru
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95c1016a
    • Wei Fang's avatar
      net: fec: Set mac_managed_pm during probe · f996e5ec
      Wei Fang authored
      commit cbc17e78 upstream.
      
      Setting mac_managed_pm during interface up is too late.
      
      In situations where the link is not brought up yet and the system suspends
      the regular PHY power management will run. Since the FEC ETHEREN control
      bit is cleared (automatically) on suspend the controller is off in resume.
      When the regular PHY power management resume path runs in this context it
      will write to the MII_DATA register but nothing will be transmitted on the
      MDIO bus.
      
      This can be observed by the following log:
      
          fec 5b040000.ethernet eth0: MDIO read timeout
          Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
          Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110
      
      The data written will however remain in the MII_DATA register.
      
      When the link later is set to administrative up it will trigger a call to
      fec_restart() which will restore the MII_SPEED register. This triggers the
      quirk explained in f166f890 ("net: ethernet: fec: Replace interrupt
      driven MDIO with polled IO") causing an extra MII_EVENT.
      
      This extra event desynchronizes all the MDIO register reads, causing them
      to complete too early. Leading all reads to read as 0 because
      fec_enet_mdio_wait() returns too early.
      
      When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
      the PHY to be initialized incorrectly and the PHY will not transmit any
      ethernet signal in this state. It cannot be brought out of this state
      without a power cycle of the PHY.
      
      Fixes: 557d5dc8 ("net: fec: use mac-managed PHY PM")
      Closes: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/
      
      
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      [jernberg: commit message]
      Signed-off-by: default avatarJohn Ernberg <john.ernberg@actia.se>
      Link: https://lore.kernel.org/r/20240328155909.59613-2-john.ernberg@actia.se
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f996e5ec
    • Duanqiang Wen's avatar
      net: txgbe: fix i2c dev name cannot match clkdev · 22a44eee
      Duanqiang Wen authored
      commit c644920c upstream.
      
      txgbe clkdev shortened clk_name, so i2c_dev info_name
      also need to shorten. Otherwise, i2c_dev cannot initialize
      clock.
      
      Fixes: e30cef00
      
       ("net: txgbe: fix clk_name exceed MAX_DEV_ID limits")
      Signed-off-by: default avatarDuanqiang Wen <duanqiangwen@net-swift.com>
      Link: https://lore.kernel.org/r/20240402021843.126192-1-duanqiangwen@net-swift.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22a44eee
    • Horatiu Vultur's avatar
      net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping · 1e304328
      Horatiu Vultur authored
      commit de99e1ea upstream.
      
      There are 2 issues with the blamed commit.
      1. When the phy is initialized, it would enable the disabled of UDPv4
         checksums. The UDPv6 checksum is already enabled by default. So when
         1-step is configured then it would clear these flags.
      2. After the 1-step is configured, then if 2-step is configured then the
         1-step would be still configured because it is not clearing the flag.
         So the sync frames will still have origin timestamps set.
      
      Fix this by reading first the value of the register and then
      just change bit 12 as this one determines if the timestamp needs to
      be inserted in the frame, without changing any other bits.
      
      Fixes: ece19502
      
       ("net: phy: micrel: 1588 support for LAN8814 phy")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarDivya Koppera <divya.koppera@microchip.com>
      Link: https://lore.kernel.org/r/20240402071634.2483524-1-horatiu.vultur@microchip.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e304328
    • Piotr Wejman's avatar
      net: stmmac: fix rx queue priority assignment · 784a6566
      Piotr Wejman authored
      commit b3da86d4 upstream.
      
      The driver should ensure that same priority is not mapped to multiple
      rx queues. From DesignWare Cores Ethernet Quality-of-Service
      Databook, section 17.1.29 MAC_RxQ_Ctrl2:
      "[...]The software must ensure that the content of this field is
      mutually exclusive to the PSRQ fields for other queues, that is,
      the same priority is not mapped to multiple Rx queues[...]"
      
      Previously rx_queue_priority() function was:
      - clearing all priorities from a queue
      - adding new priorities to that queue
      After this patch it will:
      - first assign new priorities to a queue
      - then remove those priorities from all other queues
      - keep other priorities previously assigned to that queue
      
      Fixes: a8f5102a ("net: stmmac: TX and RX queue priority configuration")
      Fixes: 2142754f
      
       ("net: stmmac: Add MAC related callbacks for XGMAC2")
      Signed-off-by: default avatarPiotr Wejman <piotrwejman90@gmail.com>
      Link: https://lore.kernel.org/r/20240401192239.33942-1-piotrwejman90@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      784a6566
    • Eric Dumazet's avatar
      net/sched: fix lockdep splat in qdisc_tree_reduce_backlog() · c040b994
      Eric Dumazet authored
      commit 7eb32236 upstream.
      
      qdisc_tree_reduce_backlog() is called with the qdisc lock held,
      not RTNL.
      
      We must use qdisc_lookup_rcu() instead of qdisc_lookup()
      
      syzbot reported:
      
      WARNING: suspicious RCU usage
      6.1.74-syzkaller #0 Not tainted
      -----------------------------
      net/sched/sch_api.c:305 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      3 locks held by udevd/1142:
        #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline]
        #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
        #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: net_tx_action+0x64a/0x970 net/core/dev.c:5282
        #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
        #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: net_tx_action+0x754/0x970 net/core/dev.c:5297
        #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline]
        #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
        #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: qdisc_tree_reduce_backlog+0x84/0x580 net/sched/sch_api.c:792
      
      stack backtrace:
      CPU: 1 PID: 1142 Comm: udevd Not tainted 6.1.74-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      Call Trace:
       <TASK>
        [<ffffffff85b85f14>] __dump_stack lib/dump_stack.c:88 [inline]
        [<ffffffff85b85f14>] dump_stack_lvl+0x1b1/0x28f lib/dump_stack.c:106
        [<ffffffff85b86007>] dump_stack+0x15/0x1e lib/dump_stack.c:113
        [<ffffffff81802299>] lockdep_rcu_suspicious+0x1b9/0x260 kernel/locking/lockdep.c:6592
        [<ffffffff84f0054c>] qdisc_lookup+0xac/0x6f0 net/sched/sch_api.c:305
        [<ffffffff84f037c3>] qdisc_tree_reduce_backlog+0x243/0x580 net/sched/sch_api.c:811
        [<ffffffff84f5b78c>] pfifo_tail_enqueue+0x32c/0x4b0 net/sched/sch_fifo.c:51
        [<ffffffff84fbcf63>] qdisc_enqueue include/net/sch_generic.h:833 [inline]
        [<ffffffff84fbcf63>] netem_dequeue+0xeb3/0x15d0 net/sched/sch_netem.c:723
        [<ffffffff84eecab9>] dequeue_skb net/sched/sch_generic.c:292 [inline]
        [<ffffffff84eecab9>] qdisc_restart net/sched/sch_generic.c:397 [inline]
        [<ffffffff84eecab9>] __qdisc_run+0x249/0x1e60 net/sched/sch_generic.c:415
        [<ffffffff84d7aa96>] qdisc_run+0xd6/0x260 include/net/pkt_sched.h:125
        [<ffffffff84d85d29>] net_tx_action+0x7c9/0x970 net/core/dev.c:5313
        [<ffffffff85e002bd>] __do_softirq+0x2bd/0x9bd kernel/softirq.c:616
        [<ffffffff81568bca>] invoke_softirq kernel/softirq.c:447 [inline]
        [<ffffffff81568bca>] __irq_exit_rcu+0xca/0x230 kernel/softirq.c:700
        [<ffffffff81568ae9>] irq_exit_rcu+0x9/0x20 kernel/softirq.c:712
        [<ffffffff85b89f52>] sysvec_apic_timer_interrupt+0x42/0x90 arch/x86/kernel/apic/apic.c:1107
        [<ffffffff85c00ccb>] asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:656
      
      Fixes: d636fc5d
      
       ("net: sched: add rcu annotations around qdisc->qdisc_sleeping")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20240402134133.2352776-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c040b994
    • Christophe JAILLET's avatar
      net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45() · f4d1fa51
      Christophe JAILLET authored
      commit c120209b upstream.
      
      The definition and declaration of sja1110_pcs_mdio_write_c45() don't have
      parameters in the same order.
      
      Knowing that sja1110_pcs_mdio_write_c45() is used as a function pointer
      in 'sja1105_info' structure with .pcs_mdio_write_c45, and that we have:
      
         int (*pcs_mdio_write_c45)(struct mii_bus *bus, int phy, int mmd,
      				  int reg, u16 val);
      
      it is likely that the definition is the one to change.
      
      Found with cppcheck, funcArgOrderDifferent.
      
      Fixes: ae271547
      
       ("net: dsa: sja1105: C45 only transactions for PCS")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarMichael Walle <mwalle@kernel.org>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/ff2a5af67361988b3581831f7bd1eddebfb4c48f.1712082763.git.christophe.jaillet@wanadoo.fr
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d1fa51
    • Eric Dumazet's avatar
      net/sched: act_skbmod: prevent kernel-infoleak · 729ad2ac
      Eric Dumazet authored
      commit d313eb8b upstream.
      
      syzbot found that tcf_skbmod_dump() was copying four bytes
      from kernel stack to user space [1].
      
      The issue here is that 'struct tc_skbmod' has a four bytes hole.
      
      We need to clear the structure before filling fields.
      
      [1]
      BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
       BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline]
       BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline]
       BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
       BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline]
       BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
        instrument_copy_to_user include/linux/instrumented.h:114 [inline]
        copy_to_user_iter lib/iov_iter.c:24 [inline]
        iterate_ubuf include/linux/iov_iter.h:29 [inline]
        iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
        iterate_and_advance include/linux/iov_iter.h:271 [inline]
        _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
        copy_to_iter include/linux/uio.h:196 [inline]
        simple_copy_to_iter net/core/datagram.c:532 [inline]
        __skb_datagram_iter+0x185/0x1000 net/core/datagram.c:420
        skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
        skb_copy_datagram_msg include/linux/skbuff.h:4050 [inline]
        netlink_recvmsg+0x432/0x1610 net/netlink/af_netlink.c:1962
        sock_recvmsg_nosec net/socket.c:1046 [inline]
        sock_recvmsg+0x2c4/0x340 net/socket.c:1068
        __sys_recvfrom+0x35a/0x5f0 net/socket.c:2242
        __do_sys_recvfrom net/socket.c:2260 [inline]
        __se_sys_recvfrom net/socket.c:2256 [inline]
        __x64_sys_recvfrom+0x126/0x1d0 net/socket.c:2256
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Uninit was stored to memory at:
        pskb_expand_head+0x30f/0x19d0 net/core/skbuff.c:2253
        netlink_trim+0x2c2/0x330 net/netlink/af_netlink.c:1317
        netlink_unicast+0x9f/0x1260 net/netlink/af_netlink.c:1351
        nlmsg_unicast include/net/netlink.h:1144 [inline]
        nlmsg_notify+0x21d/0x2f0 net/netlink/af_netlink.c:2610
        rtnetlink_send+0x73/0x90 net/core/rtnetlink.c:741
        rtnetlink_maybe_send include/linux/rtnetlink.h:17 [inline]
        tcf_add_notify net/sched/act_api.c:2048 [inline]
        tcf_action_add net/sched/act_api.c:2071 [inline]
        tc_ctl_action+0x146e/0x19d0 net/sched/act_api.c:2119
        rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
        netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
        rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
        netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:745
        ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
        ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
        __sys_sendmsg net/socket.c:2667 [inline]
        __do_sys_sendmsg net/socket.c:2676 [inline]
        __se_sys_sendmsg net/socket.c:2674 [inline]
        __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Uninit was stored to memory at:
        __nla_put lib/nlattr.c:1041 [inline]
        nla_put+0x1c6/0x230 lib/nlattr.c:1099
        tcf_skbmod_dump+0x23f/0xc20 net/sched/act_skbmod.c:256
        tcf_action_dump_old net/sched/act_api.c:1191 [inline]
        tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
        tcf_action_dump+0x1fd/0x460 net/sched/act_api.c:1251
        tca_get_fill+0x519/0x7a0 net/sched/act_api.c:1628
        tcf_add_notify_msg net/sched/act_api.c:2023 [inline]
        tcf_add_notify net/sched/act_api.c:2042 [inline]
        tcf_action_add net/sched/act_api.c:2071 [inline]
        tc_ctl_action+0x1365/0x19d0 net/sched/act_api.c:2119
        rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
        netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
        rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
        netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:745
        ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
        ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
        __sys_sendmsg net/socket.c:2667 [inline]
        __do_sys_sendmsg net/socket.c:2676 [inline]
        __se_sys_sendmsg net/socket.c:2674 [inline]
        __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Local variable opt created at:
        tcf_skbmod_dump+0x9d/0xc20 net/sched/act_skbmod.c:244
        tcf_action_dump_old net/sched/act_api.c:1191 [inline]
        tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
      
      Bytes 188-191 of 248 are uninitialized
      Memory access of size 248 starts at ffff888117697680
      Data copied to user address 00007ffe56d855f0
      
      Fixes: 86da71b5
      
       ("net_sched: Introduce skbmod action")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20240403130908.93421-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      729ad2ac
    • Will Deacon's avatar
      KVM: arm64: Ensure target address is granule-aligned for range TLBI · 3dcaf259
      Will Deacon authored
      commit 4c36a156 upstream.
      
      When zapping a table entry in stage2_try_break_pte(), we issue range
      TLB invalidation for the region that was mapped by the table. However,
      we neglect to align the base address down to the granule size and so
      if we ended up reaching the table entry via a misaligned address then
      we will accidentally skip invalidation for some prefix of the affected
      address range.
      
      Align 'ctx->addr' down to the granule size when performing TLB
      invalidation for an unmapped table in stage2_try_break_pte().
      
      Cc: Raghavendra Rao Ananta <rananta@google.com>
      Cc: Gavin Shan <gshan@redhat.com>
      Cc: Shaoqin Huang <shahuang@redhat.com>
      Cc: Quentin Perret <qperret@google.com>
      Fixes: defc8cc7
      
       ("KVM: arm64: Invalidate the table entries upon a range")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Reviewed-by: default avatarShaoqin Huang <shahuang@redhat.com>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20240327124853.11206-5-will@kernel.org
      
      
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dcaf259
    • Borislav Petkov (AMD)'s avatar
      x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO · 3ec21104
      Borislav Petkov (AMD) authored
      commit 0e110732 upstream.
      
      The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO
      case is there only for the altenative in CALL_UNTRAIN_RET to have
      a symbol to resolve.
      
      However, testing with kernels which don't have CONFIG_MITIGATION_SRSO
      enabled, leads to the warning in patch_return() to fire:
      
        missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e
        WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826
      
      Put in a plain "ret" there so that gcc doesn't put a return thunk in
      in its place which special and gets checked.
      
      In addition:
      
        ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined!
        make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1
        make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2
        make: *** [Makefile:240: __sub-make] Chyba 2
      
      since !SRSO builds would use the dummy return thunk as reported by
      petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679
      
      .
      
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.com
      
      
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ec21104
    • Jakub Sitnicki's avatar
      bpf, sockmap: Prevent lock inversion deadlock in map delete elem · 668b3074
      Jakub Sitnicki authored
      commit ff910599 upstream.
      
      syzkaller started using corpuses where a BPF tracing program deletes
      elements from a sockmap/sockhash map. Because BPF tracing programs can be
      invoked from any interrupt context, locks taken during a map_delete_elem
      operation must be hardirq-safe. Otherwise a deadlock due to lock inversion
      is possible, as reported by lockdep:
      
             CPU0                    CPU1
             ----                    ----
        lock(&htab->buckets[i].lock);
                                     local_irq_disable();
                                     lock(&host->lock);
                                     lock(&htab->buckets[i].lock);
        <Interrupt>
          lock(&host->lock);
      
      Locks in sockmap are hardirq-unsafe by design. We expects elements to be
      deleted from sockmap/sockhash only in task (normal) context with interrupts
      enabled, or in softirq context.
      
      Detect when map_delete_elem operation is invoked from a context which is
      _not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an
      error.
      
      Note that map updates are not affected by this issue. BPF verifier does not
      allow updating sockmap/sockhash from a BPF tracing program today.
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Reported-by: default avatarxingwei lee <xrivendell7@gmail.com>
      Reported-by: default avataryue sun <samsun1006219@gmail.com>
      Reported-by: default avatar <syzbot+bc922f476bd65abbd466@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+d4066896495db380182e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatar <syzbot+d4066896495db380182e@syzkaller.appspotmail.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=d4066896495db380182e
      Closes: https://syzkaller.appspot.com/bug?extid=bc922f476bd65abbd466
      Link: https://lore.kernel.org/bpf/20240402104621.1050319-1-jakub@cloudflare.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      668b3074
    • Christophe JAILLET's avatar
      vboxsf: Avoid an spurious warning if load_nls_xxx() fails · 55fabde8
      Christophe JAILLET authored
      commit de3f64b7 upstream.
      
      If an load_nls_xxx() function fails a few lines above, the 'sbi->bdi_id' is
      still 0.
      So, in the error handling path, we will call ida_simple_remove(..., 0)
      which is not allocated yet.
      
      In order to prevent a spurious "ida_free called for id=0 which is not
      allocated." message, tweak the error handling path and add a new label.
      
      Fixes: 0fd16957
      
       ("fs: Add VirtualBox guest shared folder (vboxsf) support")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Link: https://lore.kernel.org/r/d09eaaa4e2e08206c58a1a27ca9b3e81dc168773.1698835730.git.christophe.jaillet@wanadoo.fr
      
      
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55fabde8
    • Eric Dumazet's avatar
      netfilter: validate user input for expected length · 81d51b9b
      Eric Dumazet authored
      commit 0c83842d upstream.
      
      I got multiple syzbot reports showing old bugs exposed
      by BPF after commit 20f2505f ("bpf: Try to avoid kzalloc
      in cgroup/{s,g}etsockopt")
      
      setsockopt() @optlen argument should be taken into account
      before copying data.
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
       BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
      Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
      
      CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
        __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
        do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
        nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x72/0x7a
      RIP: 0033:0x7fd22067dde9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
      RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
       </TASK>
      
      Allocated by task 7238:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
        kasan_kmalloc include/linux/kasan.h:211 [inline]
        __do_kmalloc_node mm/slub.c:4069 [inline]
        __kmalloc_noprof+0x200/0x410 mm/slub.c:4082
        kmalloc_noprof include/linux/slab.h:664 [inline]
        __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
        do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x72/0x7a
      
      The buggy address belongs to the object at ffff88802cd73da0
       which belongs to the cache kmalloc-8 of size 8
      The buggy address is located 0 bytes inside of
       allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
      
      The buggy address belongs to the physical page:
      page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
      flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
      page_type: 0xffffefff(slab)
      raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
      raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
        set_page_owner include/linux/page_owner.h:32 [inline]
        post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
        prep_new_page mm/page_alloc.c:1498 [inline]
        get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
        __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
        __alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
        alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
        alloc_slab_page+0x5f/0x120 mm/slub.c:2249
        allocate_slab+0x5a/0x2e0 mm/slub.c:2412
        new_slab mm/slub.c:2465 [inline]
        ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
        __slab_alloc+0x58/0xa0 mm/slub.c:3705
        __slab_alloc_node mm/slub.c:3758 [inline]
        slab_alloc_node mm/slub.c:3936 [inline]
        __do_kmalloc_node mm/slub.c:4068 [inline]
        kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
        kstrdup+0x3a/0x80 mm/util.c:62
        device_rename+0xb5/0x1b0 drivers/base/core.c:4558
        dev_change_name+0x275/0x860 net/core/dev.c:1232
        do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
        __rtnl_newlink net/core/rtnetlink.c:3680 [inline]
        rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
        rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
      page last free pid 5146 tgid 5146 stack trace:
        reset_page_owner include/linux/page_owner.h:25 [inline]
        free_pages_prepare mm/page_alloc.c:1110 [inline]
        free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
        discard_slab mm/slub.c:2511 [inline]
        __put_partials+0xeb/0x130 mm/slub.c:2980
        put_cpu_partial+0x17c/0x250 mm/slub.c:3055
        __slab_free+0x2ea/0x3d0 mm/slub.c:4254
        qlink_free mm/kasan/quarantine.c:163 [inline]
        qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
        kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
        __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
        kasan_slab_alloc include/linux/kasan.h:201 [inline]
        slab_post_alloc_hook mm/slub.c:3888 [inline]
        slab_alloc_node mm/slub.c:3948 [inline]
        __do_kmalloc_node mm/slub.c:4068 [inline]
        __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
        kmalloc_node_noprof include/linux/slab.h:681 [inline]
        kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
        bucket_table_alloc lib/rhashtable.c:186 [inline]
        rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
        rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
        process_one_work kernel/workqueue.c:3218 [inline]
        process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
        worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
        kthread+0x2f0/0x390 kernel/kthread.c:388
        ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
      
      Memory state around the buggy address:
       ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
       ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
      >ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
                                     ^
       ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
       ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81d51b9b
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: discard table flag update with pending basechain deletion · 9627fd0c
      Pablo Neira Ayuso authored
      commit 1bc83a01 upstream.
      
      Hook unregistration is deferred to the commit phase, same occurs with
      hook updates triggered by the table dormant flag. When both commands are
      combined, this results in deleting a basechain while leaving its hook
      still registered in the core.
      
      Fixes: 179d9ba5
      
       ("netfilter: nf_tables: fix table flag updates")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9627fd0c
    • Ziyang Xuan's avatar
      netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get() · 8b891153
      Ziyang Xuan authored
      commit 24225011 upstream.
      
      nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
      concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
      And thhere is not any protection when iterate over nf_tables_flowtables
      list in __nft_flowtable_type_get(). Therefore, there is pertential
      data-race of nf_tables_flowtables list entry.
      
      Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
      in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
      nft_flowtable_type_get() to protect the entire type query process.
      
      Fixes: 3b49e2e9
      
       ("netfilter: nf_tables: add flow table netlink frontend")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b891153
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: flush pending destroy work before exit_net release · 333b5085
      Pablo Neira Ayuso authored
      commit 24cea967 upstream.
      
      Similar to 2c9f0293 ("netfilter: nf_tables: flush pending destroy
      work before netlink notifier") to address a race between exit_net and
      the destroy workqueue.
      
      The trace below shows an element to be released via destroy workqueue
      while exit_net path (triggered via module removal) has already released
      the set that is used in such transaction.
      
      [ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
      [ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
      [ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [ 1360.547984] Call Trace:
      [ 1360.547991]  <TASK>
      [ 1360.547998]  dump_stack_lvl+0x53/0x70
      [ 1360.548014]  print_report+0xc4/0x610
      [ 1360.548026]  ? __virt_addr_valid+0xba/0x160
      [ 1360.548040]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
      [ 1360.548054]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548176]  kasan_report+0xae/0xe0
      [ 1360.548189]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548312]  nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548447]  ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
      [ 1360.548577]  ? _raw_spin_unlock_irq+0x18/0x30
      [ 1360.548591]  process_one_work+0x2f1/0x670
      [ 1360.548610]  worker_thread+0x4d3/0x760
      [ 1360.548627]  ? __pfx_worker_thread+0x10/0x10
      [ 1360.548640]  kthread+0x16b/0x1b0
      [ 1360.548653]  ? __pfx_kthread+0x10/0x10
      [ 1360.548665]  ret_from_fork+0x2f/0x50
      [ 1360.548679]  ? __pfx_kthread+0x10/0x10
      [ 1360.548690]  ret_from_fork_asm+0x1a/0x30
      [ 1360.548707]  </TASK>
      
      [ 1360.548719] Allocated by task 192061:
      [ 1360.548726]  kasan_save_stack+0x20/0x40
      [ 1360.548739]  kasan_save_track+0x14/0x30
      [ 1360.548750]  __kasan_kmalloc+0x8f/0xa0
      [ 1360.548760]  __kmalloc_node+0x1f1/0x450
      [ 1360.548771]  nf_tables_newset+0x10c7/0x1b50 [nf_tables]
      [ 1360.548883]  nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
      [ 1360.548909]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
      [ 1360.548927]  netlink_unicast+0x367/0x4f0
      [ 1360.548935]  netlink_sendmsg+0x34b/0x610
      [ 1360.548944]  ____sys_sendmsg+0x4d4/0x510
      [ 1360.548953]  ___sys_sendmsg+0xc9/0x120
      [ 1360.548961]  __sys_sendmsg+0xbe/0x140
      [ 1360.548971]  do_syscall_64+0x55/0x120
      [ 1360.548982]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      [ 1360.548994] Freed by task 192222:
      [ 1360.548999]  kasan_save_stack+0x20/0x40
      [ 1360.549009]  kasan_save_track+0x14/0x30
      [ 1360.549019]  kasan_save_free_info+0x3b/0x60
      [ 1360.549028]  poison_slab_object+0x100/0x180
      [ 1360.549036]  __kasan_slab_free+0x14/0x30
      [ 1360.549042]  kfree+0xb6/0x260
      [ 1360.549049]  __nft_release_table+0x473/0x6a0 [nf_tables]
      [ 1360.549131]  nf_tables_exit_net+0x170/0x240 [nf_tables]
      [ 1360.549221]  ops_exit_list+0x50/0xa0
      [ 1360.549229]  free_exit_list+0x101/0x140
      [ 1360.549236]  unregister_pernet_operations+0x107/0x160
      [ 1360.549245]  unregister_pernet_subsys+0x1c/0x30
      [ 1360.549254]  nf_tables_module_exit+0x43/0x80 [nf_tables]
      [ 1360.549345]  __do_sys_delete_module+0x253/0x370
      [ 1360.549352]  do_syscall_64+0x55/0x120
      [ 1360.549360]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      (gdb) list *__nft_release_table+0x473
      0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
      11349           list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
      11350                   list_del(&flowtable->list);
      11351                   nft_use_dec(&table->use);
      11352                   nf_tables_flowtable_destroy(flowtable);
      11353           }
      11354           list_for_each_entry_safe(set, ns, &table->sets, list) {
      11355                   list_del(&set->list);
      11356                   nft_use_dec(&table->use);
      11357                   if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
      11358                           nft_map_deactivate(&ctx, set);
      (gdb)
      
      [ 1360.549372] Last potentially related work creation:
      [ 1360.549376]  kasan_save_stack+0x20/0x40
      [ 1360.549384]  __kasan_record_aux_stack+0x9b/0xb0
      [ 1360.549392]  __queue_work+0x3fb/0x780
      [ 1360.549399]  queue_work_on+0x4f/0x60
      [ 1360.549407]  nft_rhash_remove+0x33b/0x340 [nf_tables]
      [ 1360.549516]  nf_tables_commit+0x1c6a/0x2620 [nf_tables]
      [ 1360.549625]  nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink]
      [ 1360.549647]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
      [ 1360.549671]  netlink_unicast+0x367/0x4f0
      [ 1360.549680]  netlink_sendmsg+0x34b/0x610
      [ 1360.549690]  ____sys_sendmsg+0x4d4/0x510
      [ 1360.549697]  ___sys_sendmsg+0xc9/0x120
      [ 1360.549706]  __sys_sendmsg+0xbe/0x140
      [ 1360.549715]  do_syscall_64+0x55/0x120
      [ 1360.549725]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      Fixes: 0935d558
      
       ("netfilter: nf_tables: asynchronous release")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      333b5085
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reject new basechain after table flag update · 420132be
      Pablo Neira Ayuso authored
      commit 994209dd upstream.
      
      When dormant flag is toggled, hooks are disabled in the commit phase by
      iterating over current chains in table (existing and new).
      
      The following configuration allows for an inconsistent state:
      
        add table x
        add chain x y { type filter hook input priority 0; }
        add table x { flags dormant; }
        add chain x w { type filter hook input priority 1; }
      
      which triggers the following warning when trying to unregister chain w
      which is already unregistered.
      
      [  127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50                                                                     1 __nf_unregister_net_hook+0x21a/0x260
      [...]
      [  127.322519] Call Trace:
      [  127.322521]  <TASK>
      [  127.322524]  ? __warn+0x9f/0x1a0
      [  127.322531]  ? __nf_unregister_net_hook+0x21a/0x260
      [  127.322537]  ? report_bug+0x1b1/0x1e0
      [  127.322545]  ? handle_bug+0x3c/0x70
      [  127.322552]  ? exc_invalid_op+0x17/0x40
      [  127.322556]  ? asm_exc_invalid_op+0x1a/0x20
      [  127.322563]  ? kasan_save_free_info+0x3b/0x60
      [  127.322570]  ? __nf_unregister_net_hook+0x6a/0x260
      [  127.322577]  ? __nf_unregister_net_hook+0x21a/0x260
      [  127.322583]  ? __nf_unregister_net_hook+0x6a/0x260
      [  127.322590]  ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables]
      [  127.322655]  nft_table_disable+0x75/0xf0 [nf_tables]
      [  127.322717]  nf_tables_commit+0x2571/0x2620 [nf_tables]
      
      Fixes: 179d9ba5
      
       ("netfilter: nf_tables: fix table flag updates")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      420132be
    • Borislav Petkov (AMD)'s avatar
      x86/bugs: Fix the SRSO mitigation on Zen3/4 · e40f32f1
      Borislav Petkov (AMD) authored
      Commit 4535e1a4 upstream.
      
      The original version of the mitigation would patch in the calls to the
      untraining routines directly.  That is, the alternative() in UNTRAIN_RET
      will patch in the CALL to srso_alias_untrain_ret() directly.
      
      However, even if commit e7c25c44 ("x86/cpu: Cleanup the untrain
      mess") meant well in trying to clean up the situation, due to micro-
      architectural reasons, the untraining routine srso_alias_untrain_ret()
      must be the target of a CALL instruction and not of a JMP instruction as
      it is done now.
      
      Reshuffle the alternative macros to accomplish that.
      
      Fixes: e7c25c44
      
       ("x86/cpu: Cleanup the untrain mess")
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e40f32f1
    • Josh Poimboeuf's avatar
      93eae88e
    • Josh Poimboeuf's avatar
      x86/srso: Disentangle rethunk-dependent options · 820a3626
      Josh Poimboeuf authored
      Commit 34a3cae7
      
       upstream.
      
      CONFIG_RETHUNK, CONFIG_CPU_UNRET_ENTRY and CONFIG_CPU_SRSO are all
      tangled up.  De-spaghettify the code a bit.
      
      Some of the rethunk-related code has been shuffled around within the
      '.text..__x86.return_thunk' section, but otherwise there are no
      functional changes.  srso_alias_untrain_ret() and srso_alias_safe_ret()
      ((which are very address-sensitive) haven't moved.
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Acked-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/2845084ed303d8384905db3b87b77693945302b4.1693889988.git.jpoimboe@kernel.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      820a3626