Skip to content
  1. Apr 11, 2024
  2. Apr 10, 2024
    • Heiner Kallweit's avatar
      r8169: fix LED-related deadlock on module removal · 19fa4f2a
      Heiner Kallweit authored
      
      
      Binding devm_led_classdev_register() to the netdev is problematic
      because on module removal we get a RTNL-related deadlock. Fix this
      by avoiding the device-managed LED functions.
      
      Note: We can safely call led_classdev_unregister() for a LED even
      if registering it failed, because led_classdev_unregister() detects
      this and is a no-op in this case.
      
      Fixes: 18764b88 ("r8169: add support for LED's on RTL8168/RTL8101")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19fa4f2a
    • Brett Creeley's avatar
      pds_core: Fix pdsc_check_pci_health function to use work thread · 81665adf
      Brett Creeley authored
      
      
      When the driver notices fw_status == 0xff it tries to perform a PCI
      reset on itself via pci_reset_function() in the context of the driver's
      health thread. However, pdsc_reset_prepare calls
      pdsc_stop_health_thread(), which attempts to stop/flush the health
      thread. This results in a deadlock because the stop/flush will never
      complete since the driver called pci_reset_function() from the health
      thread context. Fix by changing the pdsc_check_pci_health_function()
      to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's
      work queue.
      
      Unloading the driver in the fw_down/dead state uncovered another issue,
      which can be seen in the following trace:
      
      WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440
      [...]
      RIP: 0010:__queue_work+0x358/0x440
      [...]
      Call Trace:
       <TASK>
       ? __warn+0x85/0x140
       ? __queue_work+0x358/0x440
       ? report_bug+0xfc/0x1e0
       ? handle_bug+0x3f/0x70
       ? exc_invalid_op+0x17/0x70
       ? asm_exc_invalid_op+0x1a/0x20
       ? __queue_work+0x358/0x440
       queue_work_on+0x28/0x30
       pdsc_devcmd_locked+0x96/0xe0 [pds_core]
       pdsc_devcmd_reset+0x71/0xb0 [pds_core]
       pdsc_teardown+0x51/0xe0 [pds_core]
       pdsc_remove+0x106/0x200 [pds_core]
       pci_device_remove+0x37/0xc0
       device_release_driver_internal+0xae/0x140
       driver_detach+0x48/0x90
       bus_remove_driver+0x6d/0xf0
       pci_unregister_driver+0x2e/0xa0
       pdsc_cleanup_module+0x10/0x780 [pds_core]
       __x64_sys_delete_module+0x142/0x2b0
       ? syscall_trace_enter.isra.18+0x126/0x1a0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7fbd9d03a14b
      [...]
      
      Fix this by preventing the devcmd reset if the FW is not running.
      
      Fixes: d9407ff1 ("pds_core: Prevent health thread from running during reset/remove")
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81665adf
    • Jiri Benc's avatar
      ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr · 7633c4da
      Jiri Benc authored
      
      
      Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
      still means hlist_for_each_entry_rcu can return an item that got removed
      from the list. The memory itself of such item is not freed thanks to RCU
      but nothing guarantees the actual content of the memory is sane.
      
      In particular, the reference count can be zero. This can happen if
      ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
      from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
      references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
      timing, this can happen:
      
      1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
      
      2. Then, the whole ipv6_del_addr is executed for the given entry. The
         reference count drops to zero and kfree_rcu is scheduled.
      
      3. ipv6_get_ifaddr continues and tries to increments the reference count
         (in6_ifa_hold).
      
      4. The rcu is unlocked and the entry is freed.
      
      5. The freed entry is returned.
      
      Prevent increasing of the reference count in such case. The name
      in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
      
      [   41.506330] refcount_t: addition on 0; use-after-free.
      [   41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
      [   41.507413] Modules linked in: veth bridge stp llc
      [   41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14
      [   41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      [   41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
      [   41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
      [   41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
      [   41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
      [   41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
      [   41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
      [   41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
      [   41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
      [   41.514086] FS:  00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
      [   41.514726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
      [   41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   41.516799] Call Trace:
      [   41.517037]  <TASK>
      [   41.517249]  ? __warn+0x7b/0x120
      [   41.517535]  ? refcount_warn_saturate+0xa5/0x130
      [   41.517923]  ? report_bug+0x164/0x190
      [   41.518240]  ? handle_bug+0x3d/0x70
      [   41.518541]  ? exc_invalid_op+0x17/0x70
      [   41.520972]  ? asm_exc_invalid_op+0x1a/0x20
      [   41.521325]  ? refcount_warn_saturate+0xa5/0x130
      [   41.521708]  ipv6_get_ifaddr+0xda/0xe0
      [   41.522035]  inet6_rtm_getaddr+0x342/0x3f0
      [   41.522376]  ? __pfx_inet6_rtm_getaddr+0x10/0x10
      [   41.522758]  rtnetlink_rcv_msg+0x334/0x3d0
      [   41.523102]  ? netlink_unicast+0x30f/0x390
      [   41.523445]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
      [   41.523832]  netlink_rcv_skb+0x53/0x100
      [   41.524157]  netlink_unicast+0x23b/0x390
      [   41.524484]  netlink_sendmsg+0x1f2/0x440
      [   41.524826]  __sys_sendto+0x1d8/0x1f0
      [   41.525145]  __x64_sys_sendto+0x1f/0x30
      [   41.525467]  do_syscall_64+0xa5/0x1b0
      [   41.525794]  entry_SYSCALL_64_after_hwframe+0x72/0x7a
      [   41.526213] RIP: 0033:0x7fbc4cfcea9a
      [   41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      [   41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
      [   41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
      [   41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
      [   41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [   41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
      [   41.531573]  </TASK>
      
      Fixes: 5c578aed ("IPv6: convert addrconf hash list to RCU")
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7633c4da
    • Jakub Kicinski's avatar
      Merge branch 'net-start-to-replace-copy_from_sockptr' · 7b6575c6
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      net: start to replace copy_from_sockptr()
      
      We got several syzbot reports about unsafe copy_from_sockptr()
      calls. After fixing some of them, it appears that we could
      use a new helper to factorize all the checks in one place.
      
      This series targets net tree, we can later start converting
      many call sites in net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20240408082845.3957374-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b6575c6
    • Eric Dumazet's avatar
      nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies · 7a87441c
      Eric Dumazet authored
      
      
      syzbot reported unsafe calls to copy_from_sockptr() [1]
      
      Use copy_safe_from_sockptr() instead.
      
      [1]
      
      BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
      Read of size 4 at addr ffff88801caa1ec3 by task syz-executor459/5078
      
      CPU: 0 PID: 5078 Comm: syz-executor459 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
        do_sock_setsockopt+0x3b1/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfd/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7f7fac07fd89
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fff660eb788 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7fac07fd89
      RDX: 0000000000000000 RSI: 0000000000000118 RDI: 0000000000000004
      RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000020000a80 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20240408082845.3957374-4-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a87441c
    • Eric Dumazet's avatar
      mISDN: fix MISDN_TIME_STAMP handling · 138b7878
      Eric Dumazet authored
      
      
      syzbot reports one unsafe call to copy_from_sockptr() [1]
      
      Use copy_safe_from_sockptr() instead.
      
      [1]
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417
      Read of size 4 at addr ffff0000c6d54083 by task syz-executor406/6167
      
      CPU: 1 PID: 6167 Comm: syz-executor406 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call trace:
        dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291
        show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x178/0x518 mm/kasan/report.c:488
        kasan_report+0xd8/0x138 mm/kasan/report.c:601
        __asan_report_load_n_noabort+0x1c/0x28 mm/kasan/report_generic.c:391
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417
        do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2311
        __sys_setsockopt+0x128/0x1a8 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2340
        __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
        invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48
        el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133
        do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152
        el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
        el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
        el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
      
      Fixes: 1b2b03f8 ("Add mISDN core files")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Link: https://lore.kernel.org/r/20240408082845.3957374-3-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      138b7878
    • Eric Dumazet's avatar
      net: add copy_safe_from_sockptr() helper · 6309863b
      Eric Dumazet authored
      
      
      copy_from_sockptr() helper is unsafe, unless callers
      did the prior check against user provided optlen.
      
      Too many callers get this wrong, lets add a helper to
      fix them and avoid future copy/paste bugs.
      
      Instead of :
      
         if (optlen < sizeof(opt)) {
             err = -EINVAL;
             break;
         }
         if (copy_from_sockptr(&opt, optval, sizeof(opt)) {
             err = -EFAULT;
             break;
         }
      
      Use :
      
         err = copy_safe_from_sockptr(&opt, sizeof(opt),
                                      optval, optlen);
         if (err)
             break;
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408082845.3957374-2-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6309863b
  3. Apr 09, 2024
    • Arnd Bergmann's avatar
      ipv4/route: avoid unused-but-set-variable warning · cf1b7201
      Arnd Bergmann authored
      
      
      The log_martians variable is only used in an #ifdef, causing a 'make W=1'
      warning with gcc:
      
      net/ipv4/route.c: In function 'ip_rt_send_redirect':
      net/ipv4/route.c:880:13: error: variable 'log_martians' set but not used [-Werror=unused-but-set-variable]
      
      Change the #ifdef to an equivalent IS_ENABLED() to let the compiler
      see where the variable is used.
      
      Fixes: 30038fc6 ("net: ip_rt_send_redirect() optimization")
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408074219.3030256-2-arnd@kernel.org
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cf1b7201
    • Arnd Bergmann's avatar
      ipv6: fib: hide unused 'pn' variable · 74043489
      Arnd Bergmann authored
      When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing
      a 'make W=1' warning:
      
      net/ipv6/ip6_fib.c: In function 'fib6_add':
      net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable]
      
      Add another #ifdef around the variable declaration, matching the other
      uses in this file.
      
      Fixes: 66729e18 ("[IPV6] ROUTE: Make sure we have fn->leaf when adding a node on subtree.")
      Link: https://lore.kernel.org/netdev/20240322131746.904943-1-arnd@kernel.org/
      
      
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408074219.3030256-1-arnd@kernel.org
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      74043489
    • Geetha sowjanya's avatar
      octeontx2-af: Fix NIX SQ mode and BP config · faf23006
      Geetha sowjanya authored
      
      
      NIX SQ mode and link backpressure configuration is required for
      all platforms. But in current driver this code is wrongly placed
      under specific platform check. This patch fixes the issue by
      moving the code out of platform check.
      
      Fixes: 5d9b976d ("octeontx2-af: Support fixed transmit scheduler topology")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Link: https://lore.kernel.org/r/20240408063643.26288-1-gakula@marvell.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      faf23006
    • Kuniyuki Iwashima's avatar
      af_unix: Clear stale u->oob_skb. · b46f4eaa
      Kuniyuki Iwashima authored
      
      
      syzkaller started to report deadlock of unix_gc_lock after commit
      4090fa37 ("af_unix: Replace garbage collection algorithm."), but
      it just uncovers the bug that has been there since commit 314001f0
      ("af_unix: Add OOB support").
      
      The repro basically does the following.
      
        from socket import *
        from array import array
      
        c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
        c2.recv(1)  # blocked as no normal data in recv queue
      
        c2.close()  # done async and unblock recv()
        c1.close()  # done async and trigger GC
      
      A socket sends its file descriptor to itself as OOB data and tries to
      receive normal data, but finally recv() fails due to async close().
      
      The problem here is wrong handling of OOB skb in manage_oob().  When
      recvmsg() is called without MSG_OOB, manage_oob() is called to check
      if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
      out of the receive queue but does not clear unix_sock(sk)->oob_skb.
      This is wrong in terms of uAPI.
      
      Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
      The 'o' is handled as OOB data.  When recv() is called twice without
      MSG_OOB, the OOB data should be lost.
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
        >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
        5
        >>> c1.send(b'world')
        5
        >>> c2.recv(5)  # OOB data is not received
        b'hell'
        >>> c2.recv(5)  # OOB date is skipped
        b'world'
        >>> c2.recv(5, MSG_OOB)  # This should return an error
        b'o'
      
      In the same situation, TCP actually returns -EINVAL for the last
      recv().
      
      Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
      EPOLLPRI even though the data has passed through by previous recv().
      
      To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
      it from recv queue.
      
      The reason why the old GC did not trigger the deadlock is because the
      old GC relied on the receive queue to detect the loop.
      
      When it is triggered, the socket with OOB data is marked as GC candidate
      because file refcount == inflight count (1).  However, after traversing
      all inflight sockets, the socket still has a positive inflight count (1),
      thus the socket is excluded from candidates.  Then, the old GC lose the
      chance to garbage-collect the socket.
      
      With the old GC, the repro continues to create true garbage that will
      never be freed nor detected by kmemleak as it's linked to the global
      inflight list.  That's why we couldn't even notice the issue.
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Reported-by: default avatar <syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169
      
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240405221057.2406-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b46f4eaa
    • Marek Vasut's avatar
      net: ks8851: Handle softirqs at the end of IRQ thread to fix hang · be0384bf
      Marek Vasut authored
      
      
      The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
      any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
      implementation is guarded by local_bh_disable() and local_bh_enable().
      The local_bh_enable() may call do_softirq() to run softirqs in case
      any are pending. One of the softirqs is net_rx_action, which ultimately
      reaches the driver .start_xmit callback. If that happens, the system
      hangs. The entire call chain is below:
      
      ks8851_start_xmit_par from netdev_start_xmit
      netdev_start_xmit from dev_hard_start_xmit
      dev_hard_start_xmit from sch_direct_xmit
      sch_direct_xmit from __dev_queue_xmit
      __dev_queue_xmit from __neigh_update
      __neigh_update from neigh_update
      neigh_update from arp_process.constprop.0
      arp_process.constprop.0 from __netif_receive_skb_one_core
      __netif_receive_skb_one_core from process_backlog
      process_backlog from __napi_poll.constprop.0
      __napi_poll.constprop.0 from net_rx_action
      net_rx_action from __do_softirq
      __do_softirq from call_with_stack
      call_with_stack from do_softirq
      do_softirq from __local_bh_enable_ip
      __local_bh_enable_ip from netif_rx
      netif_rx from ks8851_irq
      ks8851_irq from irq_thread_fn
      irq_thread_fn from irq_thread
      irq_thread from kthread
      kthread from ret_from_fork
      
      The hang happens because ks8851_irq() first locks a spinlock in
      ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
      and with that spinlock locked, calls netif_rx(). Once the execution
      reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
      which attempts to claim the already locked spinlock again, and the
      hang happens.
      
      Move the do_softirq() call outside of the spinlock protected section
      of ks8851_irq() by disabling BHs around the entire spinlock protected
      section of ks8851_irq() handler. Place local_bh_enable() outside of
      the spinlock protected section, so that it can trigger do_softirq()
      without the ks8851_par.c ks8851_lock_par() spinlock being held, and
      safely call ks8851_start_xmit_par() without attempting to lock the
      already locked spinlock.
      
      Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
      now, replace netif_rx() with __netif_rx() which is not duplicating the
      local_bh_disable()/local_bh_enable() calls.
      
      Fixes: 797047f8 ("net: ks8851: Implement Parallel bus operations")
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://lore.kernel.org/r/20240405203204.82062-2-marex@denx.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be0384bf
    • Marek Vasut's avatar
      net: ks8851: Inline ks8851_rx_skb() · f96f7004
      Marek Vasut authored
      
      
      Both ks8851_rx_skb_par() and ks8851_rx_skb_spi() call netif_rx(skb),
      inline the netif_rx(skb) call directly into ks8851_common.c and drop
      the .rx_skb callback and ks8851_rx_skb() wrapper. This removes one
      indirect call from the driver, no functional change otherwise.
      
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://lore.kernel.org/r/20240405203204.82062-1-marex@denx.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f96f7004
  4. Apr 08, 2024
  5. Apr 07, 2024
    • Hariprasad Kelam's avatar
      octeontx2-pf: Fix transmit scheduler resource leak · bccb798e
      Hariprasad Kelam authored
      
      
      Inorder to support shaping and scheduling, Upon class creation
      Netdev driver allocates trasmit schedulers.
      
      The previous patch which added support for Round robin scheduling has
      a bug due to which driver is not freeing transmit schedulers post
      class deletion.
      
      This patch fixes the same.
      
      Fixes: 47a9656f ("octeontx2-pf: htb offload support for Round Robin scheduling")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bccb798e
    • Breno Leitao's avatar
      virtio_net: Do not send RSS key if it is not supported · 059a49aa
      Breno Leitao authored
      
      
      There is a bug when setting the RSS options in virtio_net that can break
      the whole machine, getting the kernel into an infinite loop.
      
      Running the following command in any QEMU virtual machine with virtionet
      will reproduce this problem:
      
          # ethtool -X eth0  hfunc toeplitz
      
      This is how the problem happens:
      
      1) ethtool_set_rxfh() calls virtnet_set_rxfh()
      
      2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
      
      3) virtnet_commit_rss_command() populates 4 entries for the rss
      scatter-gather
      
      4) Since the command above does not have a key, then the last
      scatter-gatter entry will be zeroed, since rss_key_size == 0.
      sg_buf_size = vi->rss_key_size;
      
      5) This buffer is passed to qemu, but qemu is not happy with a buffer
      with zero length, and do the following in virtqueue_map_desc() (QEMU
      function):
      
        if (!sz) {
            virtio_error(vdev, "virtio: zero sized buffers are not allowed");
      
      6) virtio_error() (also QEMU function) set the device as broken
      
          vdev->broken = true;
      
      7) Qemu bails out, and do not repond this crazy kernel.
      
      8) The kernel is waiting for the response to come back (function
      virtnet_send_command())
      
      9) The kernel is waiting doing the following :
      
            while (!virtqueue_get_buf(vi->cvq, &tmp) &&
      	     !virtqueue_is_broken(vi->cvq))
      	      cpu_relax();
      
      10) None of the following functions above is true, thus, the kernel
      loops here forever. Keeping in mind that virtqueue_is_broken() does
      not look at the qemu `vdev->broken`, so, it never realizes that the
      vitio is broken at QEMU side.
      
      Fix it by not sending RSS commands if the feature is not available in
      the device.
      
      Fixes: c7114b12 ("drivers/net/virtio_net: Added basic RSS support.")
      Cc: stable@vger.kernel.org
      Cc: qemu-devel@nongnu.org
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarHeng Qi <hengqi@linux.alibaba.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      059a49aa
  6. Apr 06, 2024
    • Eric Dumazet's avatar
      xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING · 237f3cf1
      Eric Dumazet authored
      
      
      syzbot reported an illegal copy in xsk_setsockopt() [1]
      
      Make sure to validate setsockopt() @optlen parameter.
      
      [1]
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
      Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
      
      CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7fb40587de69
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
      RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
      RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
       </TASK>
      
      Allocated by task 7549:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
        kasan_kmalloc include/linux/kasan.h:211 [inline]
        __do_kmalloc_node mm/slub.c:3966 [inline]
        __kmalloc+0x233/0x4a0 mm/slub.c:3979
        kmalloc include/linux/slab.h:632 [inline]
        __cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
        do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      The buggy address belongs to the object at ffff888028c6cde0
       which belongs to the cache kmalloc-8 of size 8
      The buggy address is located 1 bytes to the right of
       allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
      
      The buggy address belongs to the physical page:
      page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
      anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
      page_type: 0xffffffff()
      raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
      raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
        set_page_owner include/linux/page_owner.h:31 [inline]
        post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
        prep_new_page mm/page_alloc.c:1540 [inline]
        get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
        __alloc_pages+0x256/0x680 mm/page_alloc.c:4569
        __alloc_pages_node include/linux/gfp.h:238 [inline]
        alloc_pages_node include/linux/gfp.h:261 [inline]
        alloc_slab_page+0x5f/0x160 mm/slub.c:2175
        allocate_slab mm/slub.c:2338 [inline]
        new_slab+0x84/0x2f0 mm/slub.c:2391
        ___slab_alloc+0xc73/0x1260 mm/slub.c:3525
        __slab_alloc mm/slub.c:3610 [inline]
        __slab_alloc_node mm/slub.c:3663 [inline]
        slab_alloc_node mm/slub.c:3835 [inline]
        __do_kmalloc_node mm/slub.c:3965 [inline]
        __kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
        kmalloc_node include/linux/slab.h:648 [inline]
        __vmalloc_area_node mm/vmalloc.c:3197 [inline]
        __vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
        __vmalloc_node mm/vmalloc.c:3457 [inline]
        vzalloc+0x79/0x90 mm/vmalloc.c:3530
        bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
        bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
        __sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
        __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
        __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
        __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      page last free pid 6650 tgid 6647 stack trace:
        reset_page_owner include/linux/page_owner.h:24 [inline]
        free_pages_prepare mm/page_alloc.c:1140 [inline]
        free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
        free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
        release_pages+0x2117/0x2400 mm/swap.c:1042
        tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
        tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
        tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
        tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
        exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
        __mmput+0x115/0x3c0 kernel/fork.c:1345
        exit_mm+0x220/0x310 kernel/exit.c:569
        do_exit+0x99e/0x27e0 kernel/exit.c:865
        do_group_exit+0x207/0x2c0 kernel/exit.c:1027
        get_signal+0x176e/0x1850 kernel/signal.c:2907
        arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
        exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
        exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
        __syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
        syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
        do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Memory state around the buggy address:
       ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
       ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
      >ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
                                                             ^
       ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
       ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
      
      Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: "Björn Töpel" <bjorn@kernel.org>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      237f3cf1
    • Petr Tesarik's avatar
      u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file · 38a15d0a
      Petr Tesarik authored
      Fix bogus lockdep warnings if multiple u64_stats_sync variables are
      initialized in the same file.
      
      With CONFIG_LOCKDEP, seqcount_init() is a macro which declares:
      
      	static struct lock_class_key __key;
      
      Since u64_stats_init() is a function (albeit an inline one), all calls
      within the same file end up using the same instance, effectively treating
      them all as a single lock-class.
      
      Fixes: 9464ca65 ("net: make u64_stats_init() a function")
      Closes: https://lore.kernel.org/netdev/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us.net/
      
      
      Signed-off-by: default avatarPetr Tesarik <petr@tesarici.cz>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240404075740.30682-1-petr@tesarici.cz
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      38a15d0a
    • Ilya Maximets's avatar
      net: openvswitch: fix unwanted error log on timeout policy probing · 4539f91f
      Ilya Maximets authored
      
      
      On startup, ovs-vswitchd probes different datapath features including
      support for timeout policies.  While probing, it tries to execute
      certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE
      attributes set.  These attributes tell the openvswitch module to not
      log any errors when they occur as it is expected that some of the
      probes will fail.
      
      For some reason, setting the timeout policy ignores the PROBE attribute
      and logs a failure anyway.  This is causing the following kernel log
      on each re-start of ovs-vswitchd:
      
        kernel: Failed to associated timeout policy `ovs_test_tp'
      
      Fix that by using the same logging macro that all other messages are
      using.  The message will still be printed at info level when needed
      and will be rate limited, but with a net rate limiter instead of
      generic printk one.
      
      The nf_ct_set_timeout() itself will still print some info messages,
      but at least this change makes logging in openvswitch module more
      consistent.
      
      Fixes: 06bd2bdf ("openvswitch: Add timeout support to ct action")
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Link: https://lore.kernel.org/r/20240403203803.2137962-1-i.maximets@ovn.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4539f91f
  7. Apr 05, 2024
    • Linus Torvalds's avatar
      Merge tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c88b9b4c
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter, bluetooth and bpf.
      
        Fairly usual collection of driver and core fixes. The large selftest
        accompanying one of the fixes is also becoming a common occurrence.
      
        Current release - regressions:
      
         - ipv6: fix infinite recursion in fib6_dump_done()
      
         - net/rds: fix possible null-deref in newly added error path
      
        Current release - new code bugs:
      
         - net: do not consume a full cacheline for system_page_pool
      
         - bpf: fix bpf_arena-related file descriptor leaks in the verifier
      
         - drv: ice: fix freeing uninitialized pointers, fixing misuse of the
           newfangled __free() auto-cleanup
      
        Previous releases - regressions:
      
         - x86/bpf: fixes the BPF JIT with retbleed=stuff
      
         - xen-netfront: add missing skb_mark_for_recycle, fix page pool
           accounting leaks, revealed by recently added explicit warning
      
         - tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
           non-wildcard addresses
      
         - Bluetooth:
            - replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
              better workarounds to un-break some buggy Qualcomm devices
            - set conn encrypted before conn establishes, fix re-connecting to
              some headsets which use slightly unusual sequence of msgs
      
         - mptcp:
            - prevent BPF accessing lowat from a subflow socket
            - don't account accept() of non-MPC client as fallback to TCP
      
         - drv: mana: fix Rx DMA datasize and skb_over_panic
      
         - drv: i40e: fix VF MAC filter removal
      
        Previous releases - always broken:
      
         - gro: various fixes related to UDP tunnels - netns crossing
           problems, incorrect checksum conversions, and incorrect packet
           transformations which may lead to panics
      
         - bpf: support deferring bpf_link dealloc to after RCU grace period
      
         - nf_tables:
            - release batch on table validation from abort path
            - release mutex after nft_gc_seq_end from abort path
            - flush pending destroy work before exit_net release
      
         - drv: r8169: skip DASH fw status checks when DASH is disabled"
      
      * tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
        netfilter: validate user input for expected length
        net/sched: act_skbmod: prevent kernel-infoleak
        net: usb: ax88179_178a: avoid the interface always configured as random address
        net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
        net: ravb: Always update error counters
        net: ravb: Always process TX descriptor ring
        netfilter: nf_tables: discard table flag update with pending basechain deletion
        netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
        netfilter: nf_tables: reject new basechain after table flag update
        netfilter: nf_tables: flush pending destroy work before exit_net release
        netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
        netfilter: nf_tables: release batch on table validation from abort path
        Revert "tg3: Remove residual error handling in tg3_suspend"
        tg3: Remove residual error handling in tg3_suspend
        net: mana: Fix Rx DMA datasize and skb_over_panic
        net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
        net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
        net: stmmac: fix rx queue priority assignment
        net: txgbe: fix i2c dev name cannot match clkdev
        net: fec: Set mac_managed_pm during probe
        ...
      c88b9b4c
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs · ec25bd8d
      Linus Torvalds authored
      Pull bcachefs repair code from Kent Overstreet:
       "A couple more small fixes, and new repair code.
      
        We can now automatically recover from arbitrary corrupted interior
        btree nodes by scanning, and we can reconstruct metadata as needed to
        bring a filesystem back into a working, consistent, read-write state
        and preserve access to whatevver wasn't corrupted.
      
        Meaning - you can blow away all metadata except for extents and
        dirents leaf nodes, and repair will reconstruct everything else and
        give you your data, and under the correct paths. If inodes are missing
        i_size will be slightly off and permissions/ownership/timestamps will
        be gone, and we do still need the snapshots btree if snapshots were in
        use - in the future we'll be able to guess the snapshot tree structure
        in some situations.
      
        IOW - aside from shaking out remaining bugs (fuzz testing is still
        coming), repair code should be complete and if repair ever doesn't
        work that's the highest priority bug that I want to know about
        immediately.
      
        This patchset was kindly tested by a user from India who accidentally
        wiped one drive out of a three drive filesystem with no replication on
        the family computer - it took a couple weeks but we got everything
        important back"
      
      * tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: reconstruct_inode()
        bcachefs: Subvolume reconstruction
        bcachefs: Check for extents that point to same space
        bcachefs: Reconstruct missing snapshot nodes
        bcachefs: Flag btrees with missing data
        bcachefs: Topology repair now uses nodes found by scanning to fill holes
        bcachefs: Repair pass for scanning for btree nodes
        bcachefs: Don't skip fake btree roots in fsck
        bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
        bcachefs: Etyzinger cleanups
        bcachefs: bch2_shoot_down_journal_keys()
        bcachefs: Clear recovery_passes_required as they complete without errors
        bcachefs: ratelimit informational fsck errors
        bcachefs: Check for bad needs_discard before doing discard
        bcachefs: Improve bch2_btree_update_to_text()
        mean_and_variance: Drop always failing tests
        bcachefs: fix nocow lock deadlock
        bcachefs: BCH_WATERMARK_interior_updates
        bcachefs: Fix btree node reserve
      ec25bd8d
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1cfa2f10
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-04-04
      
      We've added 7 non-merge commits during the last 5 day(s) which contain
      a total of 9 files changed, 75 insertions(+), 24 deletions(-).
      
      The main changes are:
      
      1) Fix x86 BPF JIT under retbleed=stuff which causes kernel panics due to
         incorrect destination IP calculation and incorrect IP for relocations,
         from Uros Bizjak and Joan Bruguera Micó.
      
      2) Fix BPF arena file descriptor leaks in the verifier,
         from Anton Protopopov.
      
      3) Defer bpf_link deallocation to after RCU grace period as currently
         running multi-{kprobes,uprobes} programs might still access cookie
         information from the link, from Andrii Nakryiko.
      
      4) Fix a BPF sockmap lock inversion deadlock in map_delete_elem reported
         by syzkaller, from Jakub Sitnicki.
      
      5) Fix resolve_btfids build with musl libc due to missing linux/types.h
         include, from Natanael Copa.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, sockmap: Prevent lock inversion deadlock in map delete elem
        x86/bpf: Fix IP for relocating call depth accounting
        x86/bpf: Fix IP after emitting call depth accounting
        bpf: fix possible file descriptor leaks in verifier
        tools/resolve_btfids: fix build with musl libc
        bpf: support deferring bpf_link dealloc to after RCU grace period
        bpf: put uprobe link's path and task in release callback
      ====================
      
      Link: https://lore.kernel.org/r/20240404183258.4401-1-daniel@iogearbox.net
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1cfa2f10
    • Eric Dumazet's avatar
      netfilter: validate user input for expected length · 0c83842d
      Eric Dumazet authored
      
      
      I got multiple syzbot reports showing old bugs exposed
      by BPF after commit 20f2505f ("bpf: Try to avoid kzalloc
      in cgroup/{s,g}etsockopt")
      
      setsockopt() @optlen argument should be taken into account
      before copying data.
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
       BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
      Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
      
      CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
        __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
        do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
        nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x72/0x7a
      RIP: 0033:0x7fd22067dde9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
      RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
       </TASK>
      
      Allocated by task 7238:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
        kasan_kmalloc include/linux/kasan.h:211 [inline]
        __do_kmalloc_node mm/slub.c:4069 [inline]
        __kmalloc_noprof+0x200/0x410 mm/slub.c:4082
        kmalloc_noprof include/linux/slab.h:664 [inline]
        __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
        do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x72/0x7a
      
      The buggy address belongs to the object at ffff88802cd73da0
       which belongs to the cache kmalloc-8 of size 8
      The buggy address is located 0 bytes inside of
       allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
      
      The buggy address belongs to the physical page:
      page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
      flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
      page_type: 0xffffefff(slab)
      raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
      raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
        set_page_owner include/linux/page_owner.h:32 [inline]
        post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
        prep_new_page mm/page_alloc.c:1498 [inline]
        get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
        __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
        __alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
        alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
        alloc_slab_page+0x5f/0x120 mm/slub.c:2249
        allocate_slab+0x5a/0x2e0 mm/slub.c:2412
        new_slab mm/slub.c:2465 [inline]
        ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
        __slab_alloc+0x58/0xa0 mm/slub.c:3705
        __slab_alloc_node mm/slub.c:3758 [inline]
        slab_alloc_node mm/slub.c:3936 [inline]
        __do_kmalloc_node mm/slub.c:4068 [inline]
        kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
        kstrdup+0x3a/0x80 mm/util.c:62
        device_rename+0xb5/0x1b0 drivers/base/core.c:4558
        dev_change_name+0x275/0x860 net/core/dev.c:1232
        do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
        __rtnl_newlink net/core/rtnetlink.c:3680 [inline]
        rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
        rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
      page last free pid 5146 tgid 5146 stack trace:
        reset_page_owner include/linux/page_owner.h:25 [inline]
        free_pages_prepare mm/page_alloc.c:1110 [inline]
        free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
        discard_slab mm/slub.c:2511 [inline]
        __put_partials+0xeb/0x130 mm/slub.c:2980
        put_cpu_partial+0x17c/0x250 mm/slub.c:3055
        __slab_free+0x2ea/0x3d0 mm/slub.c:4254
        qlink_free mm/kasan/quarantine.c:163 [inline]
        qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
        kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
        __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
        kasan_slab_alloc include/linux/kasan.h:201 [inline]
        slab_post_alloc_hook mm/slub.c:3888 [inline]
        slab_alloc_node mm/slub.c:3948 [inline]
        __do_kmalloc_node mm/slub.c:4068 [inline]
        __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
        kmalloc_node_noprof include/linux/slab.h:681 [inline]
        kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
        bucket_table_alloc lib/rhashtable.c:186 [inline]
        rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
        rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
        process_one_work kernel/workqueue.c:3218 [inline]
        process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
        worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
        kthread+0x2f0/0x390 kernel/kthread.c:388
        ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
      
      Memory state around the buggy address:
       ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
       ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
      >ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
                                     ^
       ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
       ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c83842d
    • Jakub Kicinski's avatar
      Merge tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · d432f7bd
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      Patch #1 unlike early commit path stage which triggers a call to abort,
               an explicit release of the batch is required on abort, otherwise
               mutex is released and commit_list remains in place.
      
      Patch #2 release mutex after nft_gc_seq_end() in commit path, otherwise
               async GC worker could collect expired objects.
      
      Patch #3 flush pending destroy work in module removal path, otherwise UaF
               is possible.
      
      Patch #4 and #6 restrict the table dormant flag with basechain updates
      	 to fix state inconsistency in the hook registration.
      
      Patch #5 adds missing RCU read side lock to flowtable type to avoid races
      	 with module removal.
      
      * tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: discard table flag update with pending basechain deletion
        netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
        netfilter: nf_tables: reject new basechain after table flag update
        netfilter: nf_tables: flush pending destroy work before exit_net release
        netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
        netfilter: nf_tables: release batch on table validation from abort path
      ====================
      
      Link: https://lore.kernel.org/r/20240404104334.1627-1-pablo@netfilter.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d432f7bd
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · a66323e4
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-04-03 (ice, idpf)
      
      This series contains updates to ice and idpf drivers.
      
      Dan Carpenter initializes some pointer declarations to NULL as needed for
      resource cleanup on ice driver.
      
      Petr Oros corrects assignment of VLAN operators to fix Rx VLAN filtering
      in legacy mode for ice.
      
      Joshua calls eth_type_trans() on unknown packets to prevent possible
      kernel panic on idpf.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        idpf: fix kernel panic on unknown packet types
        ice: fix enabling RX VLAN filtering
        ice: Fix freeing uninitialized pointers
      ====================
      
      Link: https://lore.kernel.org/r/20240403201929.1945116-1-anthony.l.nguyen@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a66323e4