Skip to content
  1. Jan 26, 2023
    • Jerome Brunet's avatar
      net: mdio-mux-meson-g12a: force internal PHY off on mux switch · 7083df59
      Jerome Brunet authored
      Force the internal PHY off then on when switching to the internal path.
      This fixes problems where the PHY ID is not properly set.
      
      Fixes: 70904251
      
       ("net: phy: add amlogic g12a mdio mux support")
      Suggested-by: default avatarQi Duan <qi.duan@amlogic.com>
      Co-developed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Link: https://lore.kernel.org/r/20230124101157.232234-1-jbrunet@baylibre.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7083df59
    • Ivan Vecera's avatar
      docs: networking: Fix bridge documentation URL · aee2770d
      Ivan Vecera authored
      
      
      Current documentation URL [1] is no longer valid.
      
      [1] https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
      
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Link: https://lore.kernel.org/r/20230124145127.189221-1-ivecera@redhat.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aee2770d
    • Gerhard Engleder's avatar
      tsnep: Fix TX queue stop/wake for multiple queues · 3d53aaef
      Gerhard Engleder authored
      netif_stop_queue() and netif_wake_queue() act on TX queue 0. This is ok
      as long as only a single TX queue is supported. But support for multiple
      TX queues was introduced with 76203137 and I missed to adapt stop
      and wake of TX queues.
      
      Use netif_stop_subqueue() and netif_tx_wake_queue() to act on specific
      TX queue.
      
      Fixes: 76203137
      
       ("tsnep: Support multiple TX/RX queue pairs")
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Link: https://lore.kernel.org/r/20230124191440.56887-1-gerhard@engleder-embedded.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d53aaef
    • David Christensen's avatar
      net/tg3: resolve deadlock in tg3_reset_task() during EEH · 6c4ca03b
      David Christensen authored
      During EEH error injection testing, a deadlock was encountered in the tg3
      driver when tg3_io_error_detected() was attempting to cancel outstanding
      reset tasks:
      
      crash> foreach UN bt
      ...
      PID: 159    TASK: c0000000067c6000  CPU: 8   COMMAND: "eehd"
      ...
       #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
       #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
       #7 [c00000000681faf0] eeh_report_error at c00000000004e25c
      ...
      
      PID: 290    TASK: c000000036e5f800  CPU: 6   COMMAND: "kworker/6:1"
      ...
       #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
       #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
       #6 [c00000003721fc60] process_one_work at c00000000019e5c4
      ...
      
      PID: 296    TASK: c000000037a65800  CPU: 21  COMMAND: "kworker/21:1"
      ...
       #4 [c000000037247bc0] rtnl_lock at c000000000c940d8
       #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
       #6 [c000000037247c60] process_one_work at c00000000019e5c4
      ...
      
      PID: 655    TASK: c000000036f49000  CPU: 16  COMMAND: "kworker/16:2"
      ...:1
      
       #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
       #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
       #6 [c0000000373ebc60] process_one_work at c00000000019e5c4
      ...
      
      Code inspection shows that both tg3_io_error_detected() and
      tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
      their code blocks.  If tg3_reset_task() should happen to execute between
      the times when tg3_io_error_deteced() acquires the RTNL lock and
      tg3_reset_task_cancel() is called, a deadlock will occur.
      
      Moving tg3_reset_task_cancel() call earlier within the code block, prior
      to acquiring RTNL, prevents this from happening, but also exposes another
      deadlock issue where tg3_reset_task() may execute AFTER
      tg3_io_error_detected() has executed:
      
      crash> foreach UN bt
      PID: 159    TASK: c0000000067d2000  CPU: 9   COMMAND: "eehd"
      ...
       #4 [c000000006867a60] rtnl_lock at c000000000c940d8
       #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
       #6 [c000000006867b00] eeh_report_reset at c00000000004de88
      ...
      PID: 363    TASK: c000000037564000  CPU: 6   COMMAND: "kworker/6:1"
      ...
       #3 [c000000036c1bb70] msleep at c000000000259e6c
       #4 [c000000036c1bba0] napi_disable at c000000000c6b848
       #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
       #6 [c000000036c1bc60] process_one_work at c00000000019e5c4
      ...
      
      This issue can be avoided by aborting tg3_reset_task() if EEH error
      recovery is already in progress.
      
      Fixes: db84bf43
      
       ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
      Signed-off-by: default avatarDavid Christensen <drc@linux.vnet.ibm.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c4ca03b
  2. Jan 25, 2023
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · ac8d986c
      David S. Miller authored
      
      
      Jeremy Kerr says:
      
      ====================
      net: mctp: struct sock lifetime fixes
      
      This series is a set of fixes for the sock lifetime handling in the
      AF_MCTP code, fixing a uaf reported by Noam Rathaus
      <noamr@ssd-disclosure.com>.
      
      The Fixes: tags indicate the original patches affected, but some
      tweaking to backport to those commits may be needed; I have a separate
      branch with backports to 5.15 if that helps with stable trees.
      
      Of course, any comments/queries most welcome.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac8d986c
    • Jeremy Kerr's avatar
      net: mctp: mark socks as dead on unhash, prevent re-add · b98e1a04
      Jeremy Kerr authored
      Once a socket has been unhashed, we want to prevent it from being
      re-used in a sk_key entry as part of a routing operation.
      
      This change marks the sk as SOCK_DEAD on unhash, which prevents addition
      into the net's key list.
      
      We need to do this during the key add path, rather than key lookup, as
      we release the net keys_lock between those operations.
      
      Fixes: 4a992bbd
      
       ("mctp: Implement message fragmentation & reassembly")
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b98e1a04
    • Paolo Abeni's avatar
      net: mctp: hold key reference when looking up a general key · 6e54ea37
      Paolo Abeni authored
      
      
      Currently, we have a race where we look up a sock through a "general"
      (ie, not directly associated with the (src,dest,tag) tuple) key, then
      drop the key reference while still holding the key's sock.
      
      This change expands the key reference until we've finished using the
      sock, and hence the sock reference too.
      
      Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>.
      
      Reported-by: default avatarNoam Rathaus <noamr@ssd-disclosure.com>
      Fixes: 73c61845
      
       ("mctp: locking, lifetime and validity changes for sk_keys")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e54ea37
    • Jeremy Kerr's avatar
      net: mctp: move expiry timer delete to unhash · 5f41ae6f
      Jeremy Kerr authored
      Currently, we delete the key expiry timer (in sk->close) before
      unhashing the sk. This means that another thread may find the sk through
      its presence on the key list, and re-queue the timer.
      
      This change moves the timer deletion to the unhash, after we have made
      the key no longer observable, so the timer cannot be re-queued.
      
      Fixes: 7b14e15a
      
       ("mctp: Implement a timeout for tags")
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f41ae6f
    • Jeremy Kerr's avatar
      net: mctp: add an explicit reference from a mctp_sk_key to sock · de8a6b15
      Jeremy Kerr authored
      Currently, we correlate the mctp_sk_key lifetime to the sock lifetime
      through the sock hash/unhash operations, but this is pretty tenuous, and
      there are cases where we may have a temporary reference to an unhashed
      sk.
      
      This change makes the reference more explicit, by adding a hold on the
      sock when it's associated with a mctp_sk_key, released on final key
      unref.
      
      Fixes: 73c61845
      
       ("mctp: locking, lifetime and validity changes for sk_keys")
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de8a6b15
    • David S. Miller's avatar
      Merge branch 'ravb-fixes' · a9e9b78d
      David S. Miller authored
      
      
      Yoshihiro Shimoda says:
      
      ====================
      net: ravb: Fix potential issues
      
      Fix potentiall issues on the ravb driver.
      
      Changes from v2:
      https://lore.kernel.org/all/20230123131331.1425648-1-yoshihiro.shimoda.uh@renesas.com/
       - Add Reviewed-by in the patch [2/2].
       - Add a commit description in the patch [2/2].
      
      Changes from v1:
      https://lore.kernel.org/all/20230119043920.875280-1-yoshihiro.shimoda.uh@renesas.com/
       - Fix typo in the patch [1/2].
       - Add Reviewed-by in the patch [1/2].
       - Fix "Fixed" tag in the patch [2/2].
       - Fix a comment indentation of the code in the patch [2/2].
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9e9b78d
    • Yoshihiro Shimoda's avatar
      net: ravb: Fix possible hang if RIS2_QFF1 happen · f3c07758
      Yoshihiro Shimoda authored
      Since this driver enables the interrupt by RIC2_QFE1, this driver
      should clear the interrupt flag if it happens. Otherwise, the interrupt
      causes to hang the system.
      
      Note that this also fix a minor coding style (a comment indentation)
      around the fixed code.
      
      Fixes: c156633f
      
       ("Renesas Ethernet AVB driver proper")
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3c07758
    • Yoshihiro Shimoda's avatar
      net: ravb: Fix lack of register setting after system resumed for Gen3 · c2b6cdee
      Yoshihiro Shimoda authored
      After system entered Suspend to RAM, registers setting of this
      hardware is reset because the SoC will be turned off. On R-Car Gen3
      (info->ccc_gac), ravb_ptp_init() is called in ravb_probe() only. So,
      after system resumed, it lacks of the initial settings for ptp. So,
      add ravb_ptp_{init,stop}() into ravb_{resume,suspend}().
      
      Fixes: f5d7837f
      
       ("ravb: ptp: Add CONFIG mode support")
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2b6cdee
    • Hyunwoo Kim's avatar
      net/x25: Fix to not accept on connected socket · f2b0b521
      Hyunwoo Kim authored
      
      
      When listen() and accept() are called on an x25 socket
      that connect() succeeds, accept() succeeds immediately.
      This is because x25_connect() queues the skb to
      sk->sk_receive_queue, and x25_accept() dequeues it.
      
      This creates a child socket with the sk of the parent
      x25 socket, which can cause confusion.
      
      Fix x25_listen() to return -EINVAL if the socket has
      already been successfully connect()ed to avoid this issue.
      
      Signed-off-by: default avatarHyunwoo Kim <v4bel@theori.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2b0b521
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 2a48216c
      Jakub Kicinski authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Perform SCTP vtag verification for ABORT/SHUTDOWN_COMPLETE according
         to RFC 9260, Sect 8.5.1.
      
      2) Fix infinite loop if SCTP chunk size is zero in for_each_sctp_chunk().
         And remove useless check in this macro too.
      
      3) Revert DATA_SENT state in the SCTP tracker, this was applied in the
         previous merge window. Next patch in this series provides a more
         simple approach to multihoming support.
      
      4) Unify HEARTBEAT_ACKED and ESTABLISHED states for SCTP multihoming
         support, use default ESTABLISHED of 210 seconds based on
         heartbeat timeout * maximum number of retransmission + round-trip timeout.
         Otherwise, SCTP conntrack entry that represents secondary paths
         remain stale in the table for up to 5 days.
      
      This is a slightly large batch with fixes for the SCTP connection
      tracking helper, all patches from Sriram Yagnaraman.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: conntrack: unify established states for SCTP paths
        Revert "netfilter: conntrack: add sctp DATA_SENT state"
        netfilter: conntrack: fix bug in for_each_sctp_chunk
        netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
      ====================
      
      Link: https://lore.kernel.org/r/20230124183933.4752-1-pablo@netfilter.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2a48216c
    • Paul M Stillwell Jr's avatar
      ice: move devlink port creation/deletion · 418e5340
      Paul M Stillwell Jr authored
      Commit a286ba73 ("ice: reorder PF/representor devlink
      port register/unregister flows") moved the code to create
      and destroy the devlink PF port. This was fine, but created
      a corner case issue in the case of ice_register_netdev()
      failing. In that case, the driver would end up calling
      ice_devlink_destroy_pf_port() twice.
      
      Additionally, it makes no sense to tie creation of the devlink
      PF port to the creation of the netdev so separate out the
      code to create/destroy the devlink PF port from the netdev
      code. This makes it a cleaner interface.
      
      Fixes: a286ba73
      
       ("ice: reorder PF/representor devlink port register/unregister flows")
      Signed-off-by: default avatarPaul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230124005714.3996270-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      418e5340
    • Marcelo Ricardo Leitner's avatar
      sctp: fail if no bound addresses can be used for a given scope · 458e279f
      Marcelo Ricardo Leitner authored
      
      
      Currently, if you bind the socket to something like:
              servaddr.sin6_family = AF_INET6;
              servaddr.sin6_port = htons(0);
              servaddr.sin6_scope_id = 0;
              inet_pton(AF_INET6, "::1", &servaddr.sin6_addr);
      
      And then request a connect to:
              connaddr.sin6_family = AF_INET6;
              connaddr.sin6_port = htons(20000);
              connaddr.sin6_scope_id = if_nametoindex("lo");
              inet_pton(AF_INET6, "fe88::1", &connaddr.sin6_addr);
      
      What the stack does is:
       - bind the socket
       - create a new asoc
       - to handle the connect
         - copy the addresses that can be used for the given scope
         - try to connect
      
      But the copy returns 0 addresses, and the effect is that it ends up
      trying to connect as if the socket wasn't bound, which is not the
      desired behavior. This unexpected behavior also allows KASLR leaks
      through SCTP diag interface.
      
      The fix here then is, if when trying to copy the addresses that can
      be used for the scope used in connect() it returns 0 addresses, bail
      out. This is what TCP does with a similar reproducer.
      
      Reported-by: default avatarPietro Borrello <borrello@diag.uniroma1.it>
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/9fcd182f1099f86c6661f3717f63712ddd1c676c.1674496737.git.marcelo.leitner@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      458e279f
    • Eric Dumazet's avatar
      net/sched: sch_taprio: do not schedule in taprio_reset() · ea4fdbaa
      Eric Dumazet authored
      As reported by syzbot and hinted by Vinicius, I should not have added
      a qdisc_synchronize() call in taprio_reset()
      
      taprio_reset() can be called with qdisc spinlock held (and BH disabled)
      as shown in included syzbot report [1].
      
      Only taprio_destroy() needed this synchronization, as explained
      in the blamed commit changelog.
      
      [1]
      
      BUG: scheduling while atomic: syz-executor150/5091/0x00000202
      2 locks held by syz-executor150/5091:
      Modules linked in:
      Preemption disabled at:
      [<0000000000000000>] 0x0
      Kernel panic - not syncing: scheduling while atomic: panic_on_warn set ...
      CPU: 1 PID: 5091 Comm: syz-executor150 Not tainted 6.2.0-rc3-syzkaller-00219-g010a74f52203 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
      panic+0x2cc/0x626 kernel/panic.c:318
      check_panic_on_warn.cold+0x19/0x35 kernel/panic.c:238
      __schedule_bug.cold+0xd5/0xfe kernel/sched/core.c:5836
      schedule_debug kernel/sched/core.c:5865 [inline]
      __schedule+0x34e4/0x5450 kernel/sched/core.c:6500
      schedule+0xde/0x1b0 kernel/sched/core.c:6682
      schedule_timeout+0x14e/0x2a0 kernel/time/timer.c:2167
      schedule_timeout_uninterruptible kernel/time/timer.c:2201 [inline]
      msleep+0xb6/0x100 kernel/time/timer.c:2322
      qdisc_synchronize include/net/sch_generic.h:1295 [inline]
      taprio_reset+0x93/0x270 net/sched/sch_taprio.c:1703
      qdisc_reset+0x10c/0x770 net/sched/sch_generic.c:1022
      dev_reset_queue+0x92/0x130 net/sched/sch_generic.c:1285
      netdev_for_each_tx_queue include/linux/netdevice.h:2464 [inline]
      dev_deactivate_many+0x36d/0x9f0 net/sched/sch_generic.c:1351
      dev_deactivate+0xed/0x1b0 net/sched/sch_generic.c:1374
      qdisc_graft+0xe4a/0x1380 net/sched/sch_api.c:1080
      tc_modify_qdisc+0xb6b/0x19a0 net/sched/sch_api.c:1689
      rtnetlink_rcv_msg+0x43e/0xca0 net/core/rtnetlink.c:6141
      netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564
      netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
      netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
      netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xd3/0x120 net/socket.c:734
      ____sys_sendmsg+0x712/0x8c0 net/socket.c:2476
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
      __sys_sendmsg+0xf7/0x1c0 net/socket.c:2559
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      
      Fixes: 3a415d59
      
       ("net/sched: sch_taprio: fix possible use-after-free")
      Link: https://lore.kernel.org/netdev/167387581653.2747.13878941339893288655.git-patchwork-notify@kernel.org/T/
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
      Link: https://lore.kernel.org/r/20230123084552.574396-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ea4fdbaa
    • Paolo Abeni's avatar
      Revert "Merge branch 'ethtool-mac-merge'" · d968117a
      Paolo Abeni authored
      This reverts commit 0ad999c1, reversing
      changes made to e38553bd
      
      .
      
      It was not intended for net.
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d968117a
  3. Jan 24, 2023
    • Kuniyuki Iwashima's avatar
      netrom: Fix use-after-free of a listening socket. · 409db27e
      Kuniyuki Iwashima authored
      syzbot reported a use-after-free in do_accept(), precisely nr_accept()
      as sk_prot_alloc() allocated the memory and sock_put() frees it. [0]
      
      The issue could happen if the heartbeat timer is fired and
      nr_heartbeat_expiry() calls nr_destroy_socket(), where a socket
      has SOCK_DESTROY or a listening socket has SOCK_DEAD.
      
      In this case, the first condition cannot be true.  SOCK_DESTROY is
      flagged in nr_release() only when the file descriptor is close()d,
      but accept() is being called for the listening socket, so the second
      condition must be true.
      
      Usually, the AF_NETROM listener neither starts timers nor sets
      SOCK_DEAD.  However, the condition is met if connect() fails before
      listen().  connect() starts the t1 timer and heartbeat timer, and
      t1timer calls nr_disconnect() when timeout happens.  Then, SOCK_DEAD
      is set, and if we call listen(), the heartbeat timer calls
      nr_destroy_socket().
      
        nr_connect
          nr_establish_data_link(sk)
            nr_start_t1timer(sk)
          nr_start_heartbeat(sk)
                                          nr_t1timer_expiry
                                            nr_disconnect(sk, ETIMEDOUT)
                                              nr_sk(sk)->state = NR_STATE_0
                                              sk->sk_state = TCP_CLOSE
                                              sock_set_flag(sk, SOCK_DEAD)
      nr_listen
        if (sk->sk_state != TCP_LISTEN)
          sk->sk_state = TCP_LISTEN
                                          nr_heartbeat_expiry
                                            switch (nr->state)
                                            case NR_STATE_0
                                              if (sk->sk_state == TCP_LISTEN &&
                                                  sock_flag(sk, SOCK_DEAD))
                                                nr_destroy_socket(sk)
      
      This path seems expected, and nr_destroy_socket() is called to clean
      up resources.  Initially, there was sock_hold() before nr_destroy_socket()
      so that the socket would not be freed, but the commit 517a16b1
      ("netrom: Decrease sock refcount when sock timers expire") accidentally
      removed it.
      
      To fix use-after-free, let's add sock_hold().
      
      [0]:
      BUG: KASAN: use-after-free in do_accept+0x483/0x510 net/socket.c:1848
      Read of size 8 at addr ffff88807978d398 by task syz-executor.3/5315
      
      CPU: 0 PID: 5315 Comm: syz-executor.3 Not tainted 6.2.0-rc3-syzkaller-00165-gd9fc1511728c #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:306 [inline]
       print_report+0x15e/0x461 mm/kasan/report.c:417
       kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
       do_accept+0x483/0x510 net/socket.c:1848
       __sys_accept4_file net/socket.c:1897 [inline]
       __sys_accept4+0x9a/0x120 net/socket.c:1927
       __do_sys_accept net/socket.c:1944 [inline]
       __se_sys_accept net/socket.c:1941 [inline]
       __x64_sys_accept+0x75/0xb0 net/socket.c:1941
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7fa436a8c0c9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fa437784168 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
      RAX: ffffffffffffffda RBX: 00007fa436bac050 RCX: 00007fa436a8c0c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
      RBP: 00007fa436ae7ae9 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffebc6700df R14: 00007fa437784300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 5294:
       kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       ____kasan_kmalloc mm/kasan/common.c:371 [inline]
       ____kasan_kmalloc mm/kasan/common.c:330 [inline]
       __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380
       kasan_kmalloc include/linux/kasan.h:211 [inline]
       __do_kmalloc_node mm/slab_common.c:968 [inline]
       __kmalloc+0x5a/0xd0 mm/slab_common.c:981
       kmalloc include/linux/slab.h:584 [inline]
       sk_prot_alloc+0x140/0x290 net/core/sock.c:2038
       sk_alloc+0x3a/0x7a0 net/core/sock.c:2091
       nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433
       __sock_create+0x359/0x790 net/socket.c:1515
       sock_create net/socket.c:1566 [inline]
       __sys_socket_create net/socket.c:1603 [inline]
       __sys_socket_create net/socket.c:1588 [inline]
       __sys_socket+0x133/0x250 net/socket.c:1636
       __do_sys_socket net/socket.c:1649 [inline]
       __se_sys_socket net/socket.c:1647 [inline]
       __x64_sys_socket+0x73/0xb0 net/socket.c:1647
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 14:
       kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518
       ____kasan_slab_free mm/kasan/common.c:236 [inline]
       ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200
       kasan_slab_free include/linux/kasan.h:177 [inline]
       __cache_free mm/slab.c:3394 [inline]
       __do_kmem_cache_free mm/slab.c:3580 [inline]
       __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587
       sk_prot_free net/core/sock.c:2074 [inline]
       __sk_destruct+0x5df/0x750 net/core/sock.c:2166
       sk_destruct net/core/sock.c:2181 [inline]
       __sk_free+0x175/0x460 net/core/sock.c:2192
       sk_free+0x7c/0xa0 net/core/sock.c:2203
       sock_put include/net/sock.h:1991 [inline]
       nr_heartbeat_expiry+0x1d7/0x460 net/netrom/nr_timer.c:148
       call_timer_fn+0x1da/0x7c0 kernel/time/timer.c:1700
       expire_timers+0x2c6/0x5c0 kernel/time/timer.c:1751
       __run_timers kernel/time/timer.c:2022 [inline]
       __run_timers kernel/time/timer.c:1995 [inline]
       run_timer_softirq+0x326/0x910 kernel/time/timer.c:2035
       __do_softirq+0x1fb/0xadc kernel/softirq.c:571
      
      Fixes: 517a16b1
      
       ("netrom: Decrease sock refcount when sock timers expire")
      Reported-by: default avatar <syzbot+5fafd5cfe1fc91f6b352@syzkaller.appspotmail.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230120231927.51711-1-kuniyu@amazon.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      409db27e
    • Sriram Yagnaraman's avatar
      netfilter: conntrack: unify established states for SCTP paths · a44b7651
      Sriram Yagnaraman authored
      An SCTP endpoint can start an association through a path and tear it
      down over another one. That means the initial path will not see the
      shutdown sequence, and the conntrack entry will remain in ESTABLISHED
      state for 5 days.
      
      By merging the HEARTBEAT_ACKED and ESTABLISHED states into one
      ESTABLISHED state, there remains no difference between a primary or
      secondary path. The timeout for the merged ESTABLISHED state is set to
      210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a
      path doesn't see the shutdown sequence, it will expire in a reasonable
      amount of time.
      
      With this change in place, there is now more than one state from which
      we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so
      handle the setting of ASSURED bit whenever a state change has happened
      and the new state is ESTABLISHED. Removed the check for dir==REPLY since
      the transition to ESTABLISHED can happen only in the reply direction.
      
      Fixes: 9fb9cbb1
      
       ("[NETFILTER]: Add nf_conntrack subsystem.")
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a44b7651
    • Sriram Yagnaraman's avatar
      Revert "netfilter: conntrack: add sctp DATA_SENT state" · 13bd9b31
      Sriram Yagnaraman authored
      This reverts commit (bff3d053
      
      : "netfilter: conntrack: add sctp
      DATA_SENT state")
      
      Using DATA/SACK to detect a new connection on secondary/alternate paths
      works only on new connections, while a HEARTBEAT is required on
      connection re-use. It is probably consistent to wait for HEARTBEAT to
      create a secondary connection in conntrack.
      
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      13bd9b31
    • Sriram Yagnaraman's avatar
      netfilter: conntrack: fix bug in for_each_sctp_chunk · 98ee0077
      Sriram Yagnaraman authored
      skb_header_pointer() will return NULL if offset + sizeof(_sch) exceeds
      skb->len, so this offset < skb->len test is redundant.
      
      if sch->length == 0, this will end up in an infinite loop, add a check
      for sch->length > 0
      
      Fixes: 9fb9cbb1
      
       ("[NETFILTER]: Add nf_conntrack subsystem.")
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      98ee0077
    • Sriram Yagnaraman's avatar
      netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE · a9993591
      Sriram Yagnaraman authored
      RFC 9260, Sec 8.5.1 states that for ABORT/SHUTDOWN_COMPLETE, the chunk
      MUST be accepted if the vtag of the packet matches its own tag and the
      T bit is not set OR if it is set to its peer's vtag and the T bit is set
      in chunk flags. Otherwise the packet MUST be silently dropped.
      
      Update vtag verification for ABORT/SHUTDOWN_COMPLETE based on the above
      description.
      
      Fixes: 9fb9cbb1
      
       ("[NETFILTER]: Add nf_conntrack subsystem.")
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a9993591
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 208a2110
      Jakub Kicinski authored
      
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-01-20 (iavf)
      
      This series contains updates to iavf driver only.
      
      Michal Schmidt converts single iavf workqueue to per adapter to avoid
      deadlock issues.
      
      Marcin moves setting of VLAN related netdev features to watchdog task to
      avoid RTNL deadlock.
      
      Stefan Assmann schedules immediate watchdog task execution on changing
      primary MAC to avoid excessive delay.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: schedule watchdog immediately when changing primary MAC
        iavf: Move netdev_update_features() into watchdog task
        iavf: fix temporary deadlock and failure to set MAC address
      ====================
      
      Link: https://lore.kernel.org/r/20230120211036.430946-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      208a2110
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 571cca79
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Fix overlap detection in rbtree set backend: Detect overlap by going
         through the ordered list of valid tree nodes. To shorten the number of
         visited nodes in the list, this algorithm descends the tree to search
         for an existing element greater than the key value to insert that is
         greater than the new element.
      
      2) Fix for the rbtree set garbage collector: Skip inactive and busy
         elements when checking for expired elements to avoid interference
         with an ongoing transaction from control plane.
      
      This is a rather large fix coming at this stage of the 6.2-rc. Since
      33c7aba0
      
       ("netfilter: nf_tables: do not set up extensions for end
      interval"), bogus overlap errors in the rbtree set occur more frequently.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nft_set_rbtree: skip elements in transaction from garbage collection
        netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
      ====================
      
      Link: https://lore.kernel.org/r/20230123211601.292930-1-pablo@netfilter.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      571cca79
    • Mat Martineau's avatar
      MAINTAINERS: Update MPTCP maintainer list and CREDITS · bce4affe
      Mat Martineau authored
      
      
      My responsibilities at Intel have changed, so I'm handing off exclusive
      MPTCP subsystem maintainer duties to Matthieu. It has been a privilege
      to see MPTCP through its initial upstreaming and first few years in the
      upstream kernel!
      
      Acked-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Link: https://lore.kernel.org/r/20230120231121.36121-1-mathew.j.martineau@linux.intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bce4affe
    • Alexandru Tachici's avatar
      net: ethernet: adi: adin1110: Fix multicast offloading · 8a4f6d02
      Alexandru Tachici authored
      Driver marked broadcast/multicast frames as offloaded incorrectly.
      Mark them as offloaded only when HW offloading has been enabled.
      This should happen only for ADIN2111 when both ports are bridged
      by the software.
      
      Fixes: bc93e19d
      
       ("net: ethernet: adi: Add ADIN1110 support")
      Signed-off-by: default avatarAlexandru Tachici <alexandru.tachici@analog.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230120090846.18172-1-alexandru.tachici@analog.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a4f6d02
    • Ahmad Fatoum's avatar
      net: dsa: microchip: fix probe of I2C-connected KSZ8563 · 360fdc99
      Ahmad Fatoum authored
      Starting with commit eee16b14 ("net: dsa: microchip: perform the
      compatibility check for dev probed"), the KSZ switch driver now bails
      out if it thinks the DT compatible doesn't match the actual chip ID
      read back from the hardware:
      
        ksz9477-switch 1-005f: Device tree specifies chip KSZ9893 but found
        KSZ8563, please fix it!
      
      For the KSZ8563, which used ksz_switch_chips[KSZ9893], this was fine
      at first, because it indeed shares the same chip id as the KSZ9893.
      
      Commit b4490809 ("net: dsa: microchip: add separate struct
      ksz_chip_data for KSZ8563 chip") started differentiating KSZ9893
      compatible chips by consulting the 0x1F register. The resulting breakage
      was fixed for the SPI driver in the same commit by introducing the
      appropriate ksz_switch_chips[KSZ8563], but not for the I2C driver.
      
      Fix this for I2C-connected KSZ8563 now to get it probing again.
      
      Fixes: b4490809
      
       ("net: dsa: microchip: add separate struct ksz_chip_data for KSZ8563 chip").
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarAhmad Fatoum <a.fatoum@pengutronix.de>
      Acked-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230120110933.1151054-1-a.fatoum@pengutronix.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      360fdc99
    • Eric Dumazet's avatar
      ipv4: prevent potential spectre v1 gadget in fib_metrics_match() · 5e9398a2
      Eric Dumazet authored
      if (!type)
              continue;
          if (type > RTAX_MAX)
              return false;
          ...
          fi_val = fi->fib_metrics->metrics[type - 1];
      
      @type being used as an array index, we need to prevent
      cpu speculation or risk leaking kernel memory content.
      
      Fixes: 5f9ae3d9
      
       ("ipv4: do metrics match when looking up and deleting a route")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e9398a2
    • Eric Dumazet's avatar
      ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() · 1d1d63b6
      Eric Dumazet authored
      if (!type)
      		continue;
      	if (type > RTAX_MAX)
      		return -EINVAL;
      	...
      	metrics[type - 1] = val;
      
      @type being used as an array index, we need to prevent
      cpu speculation or risk leaking kernel memory content.
      
      Fixes: 6cf9dfd3
      
       ("net: fib: move metrics parsing to a helper")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230120133040.3623463-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d1d63b6
    • Jakub Kicinski's avatar
      Merge branch 'netlink-annotate-various-data-races' · d6ab640c
      Jakub Kicinski authored
      
      
      Eric Dumazet says:
      
      ====================
      netlink: annotate various data races
      
      A recent syzbot report came to my attention.
      
      After addressing it, I also fixed other related races.
      ====================
      
      Link: https://lore.kernel.org/r/20230120125955.3453768-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6ab640c
    • Eric Dumazet's avatar
      netlink: annotate data races around sk_state · 9b663b5c
      Eric Dumazet authored
      netlink_getsockbyportid() reads sk_state while a concurrent
      netlink_connect() can change its value.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9b663b5c
    • Eric Dumazet's avatar
      netlink: annotate data races around dst_portid and dst_group · 004db64d
      Eric Dumazet authored
      netlink_getname(), netlink_sendmsg() and netlink_getsockbyportid()
      can read nlk->dst_portid and nlk->dst_group while another
      thread is changing them.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      004db64d
    • Eric Dumazet's avatar
      netlink: annotate data races around nlk->portid · c1bb9484
      Eric Dumazet authored
      syzbot reminds us netlink_getname() runs locklessly [1]
      
      This first patch annotates the race against nlk->portid.
      
      Following patches take care of the remaining races.
      
      [1]
      BUG: KCSAN: data-race in netlink_getname / netlink_insert
      
      write to 0xffff88814176d310 of 4 bytes by task 2315 on cpu 1:
      netlink_insert+0xf1/0x9a0 net/netlink/af_netlink.c:583
      netlink_autobind+0xae/0x180 net/netlink/af_netlink.c:856
      netlink_sendmsg+0x444/0x760 net/netlink/af_netlink.c:1895
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg net/socket.c:734 [inline]
      ____sys_sendmsg+0x38f/0x500 net/socket.c:2476
      ___sys_sendmsg net/socket.c:2530 [inline]
      __sys_sendmsg+0x19a/0x230 net/socket.c:2559
      __do_sys_sendmsg net/socket.c:2568 [inline]
      __se_sys_sendmsg net/socket.c:2566 [inline]
      __x64_sys_sendmsg+0x42/0x50 net/socket.c:2566
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff88814176d310 of 4 bytes by task 2316 on cpu 0:
      netlink_getname+0xcd/0x1a0 net/netlink/af_netlink.c:1144
      __sys_getsockname+0x11d/0x1b0 net/socket.c:2026
      __do_sys_getsockname net/socket.c:2041 [inline]
      __se_sys_getsockname net/socket.c:2038 [inline]
      __x64_sys_getsockname+0x3e/0x50 net/socket.c:2038
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x00000000 -> 0xc9a49780
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 2316 Comm: syz-executor.2 Not tainted 6.2.0-rc3-syzkaller-00030-ge8f60cd7db24-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c1bb9484
    • Pablo Neira Ayuso's avatar
      netfilter: nft_set_rbtree: skip elements in transaction from garbage collection · 5d235d6c
      Pablo Neira Ayuso authored
      Skip interference with an ongoing transaction, do not perform garbage
      collection on inactive elements. Reset annotated previous end interval
      if the expired element is marked as busy (control plane removed the
      element right before expiration).
      
      Fixes: 8d8540c4
      
       ("netfilter: nft_set_rbtree: add timeout support")
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5d235d6c
    • Pablo Neira Ayuso's avatar
      netfilter: nft_set_rbtree: Switch to node list walk for overlap detection · c9e6978e
      Pablo Neira Ayuso authored
      ...instead of a tree descent, which became overly complicated in an
      attempt to cover cases where expired or inactive elements would affect
      comparisons with the new element being inserted.
      
      Further, it turned out that it's probably impossible to cover all those
      cases, as inactive nodes might entirely hide subtrees consisting of a
      complete interval plus a node that makes the current insertion not
      overlap.
      
      To speed up the overlap check, descent the tree to find a greater
      element that is closer to the key value to insert. Then walk down the
      node list for overlap detection. Starting the overlap check from
      rb_first() unconditionally is slow, it takes 10 times longer due to the
      full linear traversal of the list.
      
      Moreover, perform garbage collection of expired elements when walking
      down the node list to avoid bogus overlap reports.
      
      For the insertion operation itself, this essentially reverts back to the
      implementation before commit 7c84d414 ("netfilter: nft_set_rbtree:
      Detect partial overlaps on insertion"), except that cases of complete
      overlap are already handled in the overlap detection phase itself, which
      slightly simplifies the loop to find the insertion point.
      
      Based on initial patch from Stefano Brivio, including text from the
      original patch description too.
      
      Fixes: 7c84d414
      
       ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c9e6978e
  4. Jan 23, 2023
    • Gergely Risko's avatar
      ipv6: fix reachability confirmation with proxy_ndp · 9f535c87
      Gergely Risko authored
      When proxying IPv6 NDP requests, the adverts to the initial multicast
      solicits are correct and working.  On the other hand, when later a
      reachability confirmation is requested (on unicast), no reply is sent.
      
      This causes the neighbor entry expiring on the sending node, which is
      mostly a non-issue, as a new multicast request is sent.  There are
      routers, where the multicast requests are intentionally delayed, and in
      these environments the current implementation causes periodic packet
      loss for the proxied endpoints.
      
      The root cause is the erroneous decrease of the hop limit, as this
      is checked in ndisc.c and no answer is generated when it's 254 instead
      of the correct 255.
      
      Cc: stable@vger.kernel.org
      Fixes: 46c7655f
      
       ("ipv6: decrease hop limit counter in ip6_forward()")
      Signed-off-by: default avatarGergely Risko <gergely.risko@gmail.com>
      Tested-by: default avatarGergely Risko <gergely.risko@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f535c87
    • David S. Miller's avatar
      Merge branch 'ethtool-mac-merge' · 0ad999c1
      David S. Miller authored
      
      
      Vladimir Oltean say:
      
      ====================
      ethtool support for IEEE 802.3 MAC Merge layer
      
      Change log
      ----------
      
      v3->v4:
      - add missing opening bracket in ocelot_port_mm_irq()
      - moved cfg.verify_time range checking so that it actually takes place
        for the updated rather than old value
      v3 at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20230117085947.2176464-1-vladimir.oltean@nxp.com/
      
      v2->v3:
      - made get_mm return int instead of void
      - deleted ETHTOOL_A_MM_SUPPORTED
      - renamed ETHTOOL_A_MM_ADD_FRAG_SIZE to ETHTOOL_A_MM_TX_MIN_FRAG_SIZE
      - introduced ETHTOOL_A_MM_RX_MIN_FRAG_SIZE
      - cleaned up documentation
      - rebased on top of PLCA changes
      - renamed ETHTOOL_STATS_SRC_* to ETHTOOL_MAC_STATS_SRC_*
      v2 at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20230111161706.1465242-1-vladimir.oltean@nxp.com/
      
      v1->v2:
      I've decided to focus just on the MAC Merge layer for now, which is why
      I am able to submit this patch set as non-RFC.
      v1 (RFC) at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20220816222920.1952936-1-vladimir.oltean@nxp.com/
      
      What is being introduced
      ------------------------
      
      TL;DR: a MAC Merge layer as defined by IEEE 802.3-2018, clause 99
      (interspersing of express traffic). This is controlled through ethtool
      netlink (ETHTOOL_MSG_MM_GET, ETHTOOL_MSG_MM_SET). The raw ethtool
      commands are posted here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20230111153638.1454687-1-vladimir.oltean@nxp.com/
      
      The MAC Merge layer has its own statistics counters
      (ethtool --include-statistics --show-mm swp0) as well as two member
      MACs, the statistics of which can be queried individually, through a new
      ethtool netlink attribute, corresponding to:
      
      $ ethtool -I --show-pause eno2 --src aggregate
      $ ethtool -S eno2 --groups eth-mac eth-phy eth-ctrl rmon -- --src pmac
      
      The core properties of the MAC Merge layer are described in great detail
      in patches 02/12 and 03/12. They can be viewed in "make htmldocs" format.
      
      Devices for which the API is supported
      --------------------------------------
      
      I decided to start with the Ethernet switch on NXP LS1028A (Felix)
      because of the smaller patch set. I also have support for the ENETC
      controller pending.
      
      I would like to get confirmation that the UAPI being proposed here will
      not restrict any use cases known by other hardware vendors.
      
      Why is support for preemptible traffic classes not here?
      --------------------------------------------------------
      
      There is legitimate concern whether the 802.1Q portion of the standard
      (which traffic classes go to the eMAC and which to the pMAC) should be
      modeled in Linux using tc or using another UAPI. I think that is
      stalling the entire series, but should be discussed separately instead.
      Removing FP adminStatus support makes me confident enough to submit this
      patch set without an RFC tag (meaning: I wouldn't mind if it was merged
      as is).
      
      What is submitted here is sufficient for an LLDP daemon to do its job.
      I've patched openlldp to advertise and configure frame preemption:
      https://github.com/vladimiroltean/openlldp/tree/frame-preemption-v3
      
      In case someone wants to try it out, here are some commands I've used.
      
       # Configure the interfaces to receive and transmit LLDP Data Units
       lldptool -L -i eno0 adminStatus=rxtx
       lldptool -L -i swp0 adminStatus=rxtx
       # Enable the transmission of certain TLVs on switch's interface
       lldptool -T -i eno0 -V addEthCap enableTx=yes
       lldptool -T -i swp0 -V addEthCap enableTx=yes
       # Query LLDP statistics on switch's interface
       lldptool -S -i swp0
       # Query the received neighbor TLVs
       lldptool -i swp0 -t -n -V addEthCap
       Additional Ethernet Capabilities TLV
               Preemption capability supported
               Preemption capability enabled
               Preemption capability active
               Additional fragment size: 60 octets
      
      So using this patch set, lldpad will be able to advertise and configure
      frame preemption, but still, no data packet will be sent as preemptible
      over the link, because there is no UAPI to control which traffic classes
      are sent as preemptible and which as express.
      
      Preemptable or preemptible?
      ---------------------------
      
      IEEE 802.3 uses "preemptable" throughout. IEEE 802.1Q uses "preemptible"
      throughout. Because the definition of "preemptible" falls under 802.1Q's
      jurisdiction and 802.3 just references it, I went with the 802.1Q naming
      even where supporting an 802.3 feature. Also, checkpatch agrees with this.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ad999c1
    • Vladimir Oltean's avatar
      net: ethtool: netlink: introduce ethnl_update_bool() · 7c494a77
      Vladimir Oltean authored
      
      
      Due to the fact that the kernel-side data structures have been carried
      over from the ioctl-based ethtool, we are now in the situation where we
      have an ethnl_update_bool32() function, but the plain function that
      operates on a boolean value kept in an actual u8 netlink attribute
      doesn't exist.
      
      With new ethtool features that are exposed solely over netlink, the
      kernel data structures will use the "bool" type, so we will need this
      kind of helper. Introduce it now; it's needed for things like
      verify-disabled for the MAC merge configuration.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c494a77
    • Wei Fang's avatar
      net: fec: Use page_pool_put_full_page when freeing rx buffers · e38553bd
      Wei Fang authored
      The page_pool_release_page was used when freeing rx buffers, and this
      function just unmaps the page (if mapped) and does not recycle the page.
      So after hundreds of down/up the eth0, the system will out of memory.
      For more details, please refer to the following reproduce steps and
      bug logs. To solve this issue and refer to the doc of page pool, the
      page_pool_put_full_page should be used to replace page_pool_release_page.
      Because this API will try to recycle the page if the page refcnt equal to
      1. After testing 20000 times, the issue can not be reproduced anymore
      (about testing 391 times the issue will occur on i.MX8MN-EVK before).
      
      Reproduce steps:
      Create the test script and run the script. The script content is as
      follows:
      LOOPS=20000
      i=1
      while [ $i -le $LOOPS ]
      do
          echo "TINFO:ENET $curface up and down test $i times"
          org_macaddr=$(cat /sys/class/net/eth0/address)
          ifconfig eth0 down
          ifconfig eth0  hw ether $org_macaddr up
          i=$(expr $i + 1)
      done
      sleep 5
      if cat /sys/class/net/eth0/operstate | grep 'up';then
          echo "TEST PASS"
      else
          echo "TEST FAIL"
      fi
      
      Bug detail logs:
      TINFO:ENET  up and down test 391 times
      [  850.471205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
      [  853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
      [  853.541694] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
      [  870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec
      [  931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec
      TINFO:ENET  up and down test 392 times
      [  991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec
      [ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec
      [ 1093.751217] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
      [ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec
      [ 1096.831245] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
      [ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
      [ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec
      [ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec
      [ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec
      [ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec
      TINFO:ENET  up and down test 393 times
      [ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec
      [ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec
      [ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec
      [ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec
      [ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec
      [ 1361.179205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
      [ 1364.255298] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
      [ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
      [ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec
      [ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec
      [ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec
      [ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec
      [ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec
      [ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec
      TINFO:ENET  up and down test 394 times
      [ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec
      [ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec
      [ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec
      [ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
      [ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G         C         6.1.1+g56321e101aca #1
      [ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT)
      [ 1537.115561] Call trace:
      [ 1537.118005]  dump_backtrace.part.0+0xe0/0xf0
      [ 1537.122289]  show_stack+0x18/0x40
      [ 1537.125608]  dump_stack_lvl+0x64/0x80
      [ 1537.129276]  dump_stack+0x18/0x34
      [ 1537.132592]  dump_header+0x44/0x208
      [ 1537.136083]  oom_kill_process+0x2b4/0x2c0
      [ 1537.140097]  out_of_memory+0xe4/0x594
      [ 1537.143766]  __alloc_pages+0xb68/0xd00
      [ 1537.147521]  alloc_pages+0xac/0x160
      [ 1537.151013]  __get_free_pages+0x14/0x40
      [ 1537.154851]  pgd_alloc+0x1c/0x30
      [ 1537.158082]  mm_init+0xf8/0x1d0
      [ 1537.161228]  mm_alloc+0x48/0x60
      [ 1537.164368]  alloc_bprm+0x7c/0x240
      [ 1537.167777]  do_execveat_common.isra.0+0x70/0x240
      [ 1537.172486]  __arm64_sys_execve+0x40/0x54
      [ 1537.176502]  invoke_syscall+0x48/0x114
      [ 1537.180255]  el0_svc_common.constprop.0+0xcc/0xec
      [ 1537.184964]  do_el0_svc+0x2c/0xd0
      [ 1537.188280]  el0_svc+0x2c/0x84
      [ 1537.191340]  el0t_64_sync_handler+0xf4/0x120
      [ 1537.195613]  el0t_64_sync+0x18c/0x190
      [ 1537.199334] Mem-Info:
      [ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0
      [ 1537.201620]  active_file:54 inactive_file:112 isolated_file:0
      [ 1537.201620]  unevictable:0 dirty:0 writeback:0
      [ 1537.201620]  slab_reclaimable:2620 slab_unreclaimable:7076
      [ 1537.201620]  mapped:1489 shmem:2473 pagetables:466
      [ 1537.201620]  sec_pagetables:0 bounce:0
      [ 1537.201620]  kernel_misc_reclaimable:0
      [ 1537.201620]  free:136672 free_pcp:96 free_cma:129241
      [ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s
      [ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB
      [ 1537.300219] lowmem_reserve[]: 0 0 0 0
      [ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB
      [ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
      [ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
      [ 1537.358087] 2939 total pagecache pages
      [ 1537.361876] 0 pages in swap cache
      [ 1537.365229] Free swap  = 0kB
      [ 1537.368147] Total swap = 0kB
      [ 1537.371065] 516096 pages RAM
      [ 1537.373959] 0 pages HighMem/MovableOnly
      [ 1537.377834] 17302 pages reserved
      [ 1537.381103] 163840 pages cma reserved
      [ 1537.384809] 0 pages hwpoisoned
      [ 1537.387902] Tasks state (memory values in pages):
      [ 1537.392652] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
      [ 1537.401356] [    201]   993   201     1130       72    45056        0             0 rpcbind
      [ 1537.409772] [    202]     0   202     4529     1640    77824        0          -250 systemd-journal
      [ 1537.418861] [    222]     0   222     4691      801    69632        0         -1000 systemd-udevd
      [ 1537.427787] [    248]   994   248    20914      130    65536        0             0 systemd-timesyn
      [ 1537.436884] [    497]     0   497      620       31    49152        0             0 atd
      [ 1537.444938] [    500]     0   500      854       77    53248        0             0 crond
      [ 1537.453165] [    503]   997   503     1470      160    49152        0          -900 dbus-daemon
      [ 1537.461908] [    505]     0   505      633       24    40960        0             0 firmwared
      [ 1537.470491] [    513]     0   513     2507      180    61440        0             0 ofonod
      [ 1537.478800] [    514]   990   514    69640      137    81920        0             0 parsec
      [ 1537.487120] [    533]     0   533      599       39    40960        0             0 syslogd
      [ 1537.495518] [    534]     0   534     4546      148    65536        0             0 systemd-logind
      [ 1537.504560] [    535]     0   535      690       24    45056        0             0 tee-supplicant
      [ 1537.513564] [    540]   996   540     2769      168    61440        0             0 systemd-network
      [ 1537.522680] [    566]     0   566     3878      228    77824        0             0 connmand
      [ 1537.531168] [    645]   998   645     1538      133    57344        0             0 avahi-daemon
      [ 1537.540004] [    646]   998   646     1461       64    57344        0             0 avahi-daemon
      [ 1537.548846] [    648]   992   648      781       41    45056        0             0 rpc.statd
      [ 1537.557415] [    650] 64371   650      590       23    45056        0             0 ninfod
      [ 1537.565754] [    653] 61563   653      555       24    45056        0             0 rdisc
      [ 1537.573971] [    655]     0   655   374569     2999   290816        0          -999 containerd
      [ 1537.582621] [    658]     0   658     1311       20    49152        0             0 agetty
      [ 1537.590922] [    663]     0   663     1529       97    49152        0             0 login
      [ 1537.599138] [    666]     0   666     3430      202    69632        0             0 wpa_supplicant
      [ 1537.608147] [    667]     0   667     2344       96    61440        0             0 systemd-userdbd
      [ 1537.617240] [    677]     0   677     2964      314    65536        0           100 systemd
      [ 1537.625651] [    679]     0   679     3720      646    73728        0           100 (sd-pam)
      [ 1537.634138] [    687]     0   687     1289      403    45056        0             0 sh
      [ 1537.642108] [    789]     0   789      970       93    45056        0             0 eth_test2.sh
      [ 1537.650955] [   2355]     0  2355     2346       94    61440        0             0 systemd-userwor
      [ 1537.660046] [   2356]     0  2356     2346       94    61440        0             0 systemd-userwor
      [ 1537.669137] [   2358]     0  2358     2346       95    57344        0             0 systemd-userwor
      [ 1537.678258] [   2379]     0  2379      970       93    45056        0             0 eth_test2.sh
      [ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service,tas0
      [ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0
      [ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec
      
      Fixes: 95698ff6
      
       ("net: fec: using page pool to manage RX buffers")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarshenwei wang <Shenwei.wang@nxp.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e38553bd