Skip to content
  1. Jul 20, 2023
    • Florian Westphal's avatar
      netfilter: nf_tables: fix spurious set element insertion failure · ddbd8be6
      Florian Westphal authored
      On some platforms there is a padding hole in the nft_verdict
      structure, between the verdict code and the chain pointer.
      
      On element insertion, if the new element clashes with an existing one and
      NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as
      the data associated with duplicated element is the same as the existing
      one.  The data equality check uses memcmp.
      
      For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT
      padding area leads to spurious failure even if the verdict data is the
      same.
      
      This then makes the insertion fail with 'already exists' error, even
      though the new "key : data" matches an existing entry and userspace
      told the kernel that it doesn't want to receive an error indication.
      
      Fixes: c016c7e4
      
       ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      ddbd8be6
    • Paolo Abeni's avatar
      Merge branch 'net-support-stp-on-bridge-in-non-root-netns' · ac528649
      Paolo Abeni authored
      
      
      Kuniyuki Iwashima says:
      
      ====================
      net: Support STP on bridge in non-root netns.
      
      Currently, STP does not work in non-root netns as llc_rcv() drops
      packets from non-root netns.
      
      This series fixes it by making some protocol handlers netns-aware,
      which are called from llc_rcv() as follows:
      
        llc_rcv()
        |
        |- sap->rcv_func : registered by llc_sap_open()
        |
        |  * functions : regsitered by register_8022_client()
        |    -> No in-kernel user call register_8022_client()
        |
        |  * snap_rcv()
        |    |
        |    `- proto->rcvfunc() : registered by register_snap_client()
        |
        |       * aarp_rcv()  : drop packets from non-root netns
        |       * atalk_rcv() : drop packets from non-root netns
        |
        |  * stp_pdu_rcv()
        |    |
        |    `- garp_protos[]->rcv() : registered by stp_proto_register()
        |
        |       * garp_pdu_rcv() : netns-aware
        |       * br_stp_rcv()   : netns-aware
        |
        |- llc_type_handlers[llc_pdu_type(skb) - 1]
        |
        |  * llc_sap_handler()  : NOT netns-aware (Patch 1)
        |  * llc_conn_handler() : NOT netns-aware (Patch 2)
        |
        `- llc_station_handler
      
           * llc_station_rcv() : netns-aware
      
      Patch 1 & 2 convert not-netns-aware functions and Patch 3 remove the
      netns restriction in llc_rcv().
      
      Note this series does not namespacify AF_LLC so that these patches
      can be backported to stable without conflicts (at least to 4.14.y).
      
      Another series that adds netns support for AF_LLC will be targeted
      to net-next later.
      ====================
      
      Link: https://lore.kernel.org/r/20230718174152.57408-1-kuniyu@amazon.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ac528649
    • Kuniyuki Iwashima's avatar
      Revert "bridge: Add extack warning when enabling STP in netns." · 7ebd00a5
      Kuniyuki Iwashima authored
      This reverts commit 56a16035
      
      .
      
      Since the previous commit, STP works on bridge in netns.
      
        # unshare -n
        # ip link add br0 type bridge
        # ip link add veth0 type veth peer name veth1
      
        # ip link set veth0 master br0 up
        [   50.558135] br0: port 1(veth0) entered blocking state
        [   50.558366] br0: port 1(veth0) entered disabled state
        [   50.558798] veth0: entered allmulticast mode
        [   50.564401] veth0: entered promiscuous mode
      
        # ip link set veth1 master br0 up
        [   54.215487] br0: port 2(veth1) entered blocking state
        [   54.215657] br0: port 2(veth1) entered disabled state
        [   54.215848] veth1: entered allmulticast mode
        [   54.219577] veth1: entered promiscuous mode
      
        # ip link set br0 type bridge stp_state 1
        # ip link set br0 up
        [   61.960726] br0: port 2(veth1) entered blocking state
        [   61.961097] br0: port 2(veth1) entered listening state
        [   61.961495] br0: port 1(veth0) entered blocking state
        [   61.961653] br0: port 1(veth0) entered listening state
        [   63.998835] br0: port 2(veth1) entered blocking state
        [   77.437113] br0: port 1(veth0) entered learning state
        [   86.653501] br0: received packet on veth0 with own address as source address (addr:6e:0f:e7:6f:5f:5f, vlan:0)
        [   92.797095] br0: port 1(veth0) entered forwarding state
        [   92.797398] br0: topology change detected, propagating
      
      Let's remove the warning.
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7ebd00a5
    • Kuniyuki Iwashima's avatar
      llc: Don't drop packet from non-root netns. · 6631463b
      Kuniyuki Iwashima authored
      Now these upper layer protocol handlers can be called from llc_rcv()
      as sap->rcv_func(), which is registered by llc_sap_open().
      
        * function which is passed to register_8022_client()
          -> no in-kernel user calls register_8022_client().
      
        * snap_rcv()
          `- proto->rcvfunc() : registered by register_snap_client()
             -> aarp_rcv() and atalk_rcv() drop packets from non-root netns
      
        * stp_pdu_rcv()
          `- garp_protos[]->rcv() : registered by stp_proto_register()
             -> garp_pdu_rcv() and br_stp_rcv() are netns-aware
      
      So, we can safely remove the netns restriction in llc_rcv().
      
      Fixes: e730c155
      
       ("[NET]: Make packet reception network namespace safe")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6631463b
    • Kuniyuki Iwashima's avatar
      llc: Check netns in llc_estab_match() and llc_listener_match(). · 97b1d320
      Kuniyuki Iwashima authored
      
      
      We will remove this restriction in llc_rcv() in the following patch,
      which means that the protocol handler must be aware of netns.
      
              if (!net_eq(dev_net(dev), &init_net))
                      goto drop;
      
      llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
      if not NULL.
      
      If the PDU type is LLC_DEST_CONN, llc_conn_handler() is called to pass
      skb to corresponding sockets.  Then, we must look up a proper socket in
      the same netns with skb->dev.
      
      llc_conn_handler() calls __llc_lookup() to look up a established or
      litening socket by __llc_lookup_established() and llc_lookup_listener().
      
      Both functions iterate on a list and call llc_estab_match() or
      llc_listener_match() to check if the socket is the correct destination.
      However, these functions do not check netns.
      
      Also, bind() and connect() call llc_establish_connection(), which
      finally calls __llc_lookup_established(), to check if there is a
      conflicting socket.
      
      Let's test netns in llc_estab_match() and llc_listener_match().
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      97b1d320
    • Kuniyuki Iwashima's avatar
      llc: Check netns in llc_dgram_match(). · 9b64e93e
      Kuniyuki Iwashima authored
      
      
      We will remove this restriction in llc_rcv() soon, which means that the
      protocol handler must be aware of netns.
      
      	if (!net_eq(dev_net(dev), &init_net))
      		goto drop;
      
      llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
      if not NULL.
      
      If the PDU type is LLC_DEST_SAP, llc_sap_handler() is called to pass skb
      to corresponding sockets.  Then, we must look up a proper socket in the
      same netns with skb->dev.
      
      If the destination is a multicast address, llc_sap_handler() calls
      llc_sap_mcast().  It calculates a hash based on DSAP and skb->dev->ifindex,
      iterates on a socket list, and calls llc_mcast_match() to check if the
      socket is the correct destination.  Then, llc_mcast_match() checks if
      skb->dev matches with llc_sk(sk)->dev.  So, we need not check netns here.
      
      OTOH, if the destination is a unicast address, llc_sap_handler() calls
      llc_lookup_dgram() to look up a socket, but it does not check the netns.
      
      Therefore, we need to add netns check in llc_lookup_dgram().
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9b64e93e
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: always mtk_get_ib1_pkt_type · 9f9d4c1a
      Daniel Golle authored
      entries and bind debugfs files would display wrong data on NETSYS_V2 and
      later because instead of using mtk_get_ib1_pkt_type the driver would use
      MTK_FOE_IB1_PACKET_TYPE which corresponds to NETSYS_V1(.x) SoCs.
      Use mtk_get_ib1_pkt_type so entries and bind records display correctly.
      
      Fixes: 03a3180e
      
       ("net: ethernet: mtk_eth_soc: introduce flow offloading support for mt7986")
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/c0ae03d0182f4d27b874cbdf0059bc972c317f3c.1689727134.git.daniel@makrotopia.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f9d4c1a
    • Jakub Kicinski's avatar
      Merge branch 'r8169-revert-two-changes-that-caused-regressions' · 88f2e009
      Jakub Kicinski authored
      
      
      Heiner Kallweit says:
      
      ====================
      r8169: revert two changes that caused regressions
      
      This reverts two changes that caused regressions.
      ====================
      
      Link: https://lore.kernel.org/r/ddadceae-19c9-81b8-47b5-a4ff85e2563a@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      88f2e009
    • Heiner Kallweit's avatar
      Revert "r8169: disable ASPM during NAPI poll" · e31a9fed
      Heiner Kallweit authored
      This reverts commit e1ed3e4d
      
      .
      
      Turned out the change causes a performance regression.
      
      Link: https://lore.kernel.org/netdev/20230713124914.GA12924@green245/T/
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/055c6bc2-74fa-8c67-9897-3f658abb5ae7@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e31a9fed
    • Heiner Kallweit's avatar
      r8169: revert 2ab19de6 ("r8169: remove ASPM restrictions now that ASPM is... · cf2ffdea
      Heiner Kallweit authored
      r8169: revert 2ab19de6 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll")
      
      There have been reports that on a number of systems this change breaks
      network connectivity. Therefore effectively revert it. Mainly affected
      seem to be systems where BIOS denies ASPM access to OS.
      Due to later changes we can't do a direct revert.
      
      Fixes: 2ab19de6
      
       ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll")
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/netdev/e47bac0d-e802-65e1-b311-6acb26d5cf10@freenet.de/T/
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217596
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/57f13ec0-b216-d5d8-363d-5b05528ec5fb@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf2ffdea
    • Kuniyuki Iwashima's avatar
      Revert "tcp: avoid the lookup process failing to get sk in ehash table" · 81b3ade5
      Kuniyuki Iwashima authored
      This reverts commit 3f4ca5fa.
      
      Commit 3f4ca5fa ("tcp: avoid the lookup process failing to get sk in
      ehash table") reversed the order in how a socket is inserted into ehash
      to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are
      swapped.  However, it introduced another lookup failure.
      
      The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU
      and does not have SOCK_RCU_FREE, so the socket could be reused even while
      it is being referenced on another CPU doing RCU lookup.
      
      Let's say a socket is reused and inserted into the same hash bucket during
      lookup.  After the blamed commit, a new socket is inserted at the end of
      the list.  If that happens, we will skip sockets placed after the previous
      position of the reused socket, resulting in ehash lookup failure.
      
      As described in Documentation/RCU/rculist_nulls.rst, we should insert a
      new socket at the head of the list to avoid such an issue.
      
      This issue, the swap-lookup-failure, and another variant reported in [0]
      can all be handled properly by adding a locked ehash lookup suggested by
      Eric Dumazet [1].
      
      However, this issue could occur for every packet, thus more likely than
      the other two races, so let's revert the change for now.
      
      Link: https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/ [0]
      Link: https://lore.kernel.org/netdev/CANn89iK8snOz8TYOhhwfimC7ykYA78GA3Nyv8x06SZYa1nKdyA@mail.gmail.com/ [1]
      Fixes: 3f4ca5fa
      
       ("tcp: avoid the lookup process failing to get sk in ehash table")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717215918.15723-1-kuniyu@amazon.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      81b3ade5
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e80698b7
      Jakub Kicinski authored
      
      
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2023-07-19
      
      We've added 4 non-merge commits during the last 1 day(s) which contain
      a total of 3 files changed, 55 insertions(+), 10 deletions(-).
      
      The main changes are:
      
      1) Fix stack depth check in presence of async callbacks,
         from Kumar Kartikeya Dwivedi.
      
      2) Fix BTI type used for freplace attached functions,
         from Alexander Duyck.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, arm64: Fix BTI type used for freplace attached functions
        selftests/bpf: Add more tests for check_max_stack_depth bug
        bpf: Repeat check_max_stack_depth for async callbacks
        bpf: Fix subprog idx logic in check_max_stack_depth
      ====================
      
      Link: https://lore.kernel.org/r/20230719174502.74023-1-alexei.starovoitov@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e80698b7
  2. Jul 19, 2023
    • Yuanjun Gong's avatar
      ipv4: ip_gre: fix return value check in erspan_xmit() · aa7cb378
      Yuanjun Gong authored
      
      
      goto free_skb if an unexpected result is returned by pskb_tirm()
      in erspan_xmit().
      
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa7cb378
    • Yuanjun Gong's avatar
      ipv4: ip_gre: fix return value check in erspan_fb_xmit() · 02d84f3e
      Yuanjun Gong authored
      
      
      goto err_free_skb if an unexpected result is returned by pskb_tirm()
      in erspan_fb_xmit().
      
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02d84f3e
    • Yuanjun Gong's avatar
      drivers:net: fix return value check in ocelot_fdma_receive_skb · bce56033
      Yuanjun Gong authored
      
      
      ocelot_fdma_receive_skb should return false if an unexpected
      value is returned by pskb_trim.
      
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bce56033
    • Yuanjun Gong's avatar
      drivers: net: fix return value check in emac_tso_csum() · 78a93c31
      Yuanjun Gong authored
      
      
      in emac_tso_csum(), return an error code if an unexpected value
      is returned by pskb_trim().
      
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78a93c31
    • Yuanjun Gong's avatar
      net:ipv6: check return value of pskb_trim() · 4258faa1
      Yuanjun Gong authored
      goto tx_err if an unexpected result is returned by pskb_tirm()
      in ip6erspan_tunnel_xmit().
      
      Fixes: 5a963eb6
      
       ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4258faa1
    • Wang Ming's avatar
      net: ipv4: Use kfree_sensitive instead of kfree · daa75144
      Wang Ming authored
      key might contain private part of the key, so better use
      kfree_sensitive to free it.
      
      Fixes: 38320c70
      
       ("[IPSEC]: Use crypto_aead and authenc in ESP")
      Signed-off-by: default avatarWang Ming <machel@vivo.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      daa75144
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 7f5acea7
      Jakub Kicinski authored
      
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-17 (iavf)
      
      This series contains updates to iavf driver only.
      
      Ding Hui fixes use-after-free issue by calling netif_napi_del() for all
      allocated q_vectors. He also resolves out-of-bounds issue by not
      updating to new values when timeout is encountered.
      
      Marcin and Ahmed change the way resets are handled so that the callback
      operating under the RTNL lock will wait for the reset to finish, the
      rtnl_lock sensitive functions in reset flow will schedule the netdev update
      for later in order to remove circular dependency with the critical lock.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: fix reset task race with iavf_remove()
        iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies
        Revert "iavf: Do not restart Tx queues after reset task failure"
        Revert "iavf: Detach device during reset task"
        iavf: Wait for reset in callbacks which trigger it
        iavf: use internal state to free traffic IRQs
        iavf: Fix out-of-bounds when setting channels on remove
        iavf: Fix use-after-free in free_netdev
      ====================
      
      Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7f5acea7
    • Jakub Kicinski's avatar
      Merge branch 'tcp-annotate-data-races-in-tcp_rsk-req' · e9b2bd96
      Jakub Kicinski authored
      
      
      Eric Dumazet says:
      
      ====================
      tcp: annotate data-races in tcp_rsk(req)
      
      Small series addressing two syzbot reports around tcp_rsk(req)
      ====================
      
      Link: https://lore.kernel.org/r/20230717144445.653164-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e9b2bd96
    • Eric Dumazet's avatar
      tcp: annotate data-races around tcp_rsk(req)->ts_recent · eba20811
      Eric Dumazet authored
      TCP request sockets are lockless, tcp_rsk(req)->ts_recent
      can change while being read by another cpu as syzbot noticed.
      
      This is harmless, but we should annotate the known races.
      
      Note that tcp_check_req() changes req->ts_recent a bit early,
      we might change this in the future.
      
      BUG: KCSAN: data-race in tcp_check_req / tcp_check_req
      
      write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1:
      tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762
      tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
      ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:468 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5493 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      do_softirq+0x7e/0xb0 kernel/softirq.c:472
      __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396
      local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33
      rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline]
      __dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271
      dev_queue_xmit include/linux/netdevice.h:3088 [inline]
      neigh_hh_output include/net/neighbour.h:528 [inline]
      neigh_output include/net/neighbour.h:542 [inline]
      ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317
      NF_HOOK_COND include/linux/netfilter.h:292 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431
      dst_output include/net/dst.h:458 [inline]
      ip_local_out net/ipv4/ip_output.c:126 [inline]
      __ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533
      ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547
      __tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399
      tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
      tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693
      __tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877
      tcp_push_pending_frames include/net/tcp.h:1952 [inline]
      __tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline]
      tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343
      rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52
      rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422
      rds_send_worker+0x42/0x1d0 net/rds/threads.c:200
      process_one_work+0x3e6/0x750 kernel/workqueue.c:2408
      worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0:
      tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622
      tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
      ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:468 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5493 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
      smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      value changed: 0x1cd237f1 -> 0x1cd237f2
      
      Fixes: 079096f1
      
       ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717144445.653164-3-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eba20811
    • Eric Dumazet's avatar
      tcp: annotate data-races around tcp_rsk(req)->txhash · 5e526552
      Eric Dumazet authored
      TCP request sockets are lockless, some of their fields
      can change while being read by another cpu as syzbot noticed.
      
      This is usually harmless, but we should annotate the known
      races.
      
      This patch takes care of tcp_rsk(req)->txhash,
      a separate one is needed for tcp_rsk(req)->ts_recent.
      
      BUG: KCSAN: data-race in tcp_make_synack / tcp_rtx_synack
      
      write to 0xffff8881362304bc of 4 bytes by task 32083 on cpu 1:
      tcp_rtx_synack+0x9d/0x2a0 net/ipv4/tcp_output.c:4213
      inet_rtx_syn_ack+0x38/0x80 net/ipv4/inet_connection_sock.c:880
      tcp_check_req+0x379/0xc70 net/ipv4/tcp_minisocks.c:665
      tcp_v6_rcv+0x125b/0x1b20 net/ipv6/tcp_ipv6.c:1673
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      netif_receive_skb_internal net/core/dev.c:5652 [inline]
      netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
      tun_rx_batched+0x3bf/0x400
      tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
      tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
      call_write_iter include/linux/fs.h:1871 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x4ab/0x7d0 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff8881362304bc of 4 bytes by task 32078 on cpu 0:
      tcp_make_synack+0x367/0xb40 net/ipv4/tcp_output.c:3663
      tcp_v6_send_synack+0x72/0x420 net/ipv6/tcp_ipv6.c:544
      tcp_conn_request+0x11a8/0x1560 net/ipv4/tcp_input.c:7059
      tcp_v6_conn_request+0x13f/0x180 net/ipv6/tcp_ipv6.c:1175
      tcp_rcv_state_process+0x156/0x1de0 net/ipv4/tcp_input.c:6494
      tcp_v6_do_rcv+0x98a/0xb70 net/ipv6/tcp_ipv6.c:1509
      tcp_v6_rcv+0x17b8/0x1b20 net/ipv6/tcp_ipv6.c:1735
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      netif_receive_skb_internal net/core/dev.c:5652 [inline]
      netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
      tun_rx_batched+0x3bf/0x400
      tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
      tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
      call_write_iter include/linux/fs.h:1871 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x4ab/0x7d0 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x91d25731 -> 0xe79325cd
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1a754 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
      
      Fixes: 58d607d3
      
       ("tcp: provide skb->hash to synack packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e526552
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Generate hash key using ecb(aes) · e7002b3b
      Subbaraya Sundeep authored
      Hardware generated encryption and ICV tags are found to
      be wrong when tested with IEEE MACSEC test vectors.
      This is because as per the HRM, the hash key (derived by
      AES-ECB block encryption of an all 0s block with the SAK)
      has to be programmed by the software in
      MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register.
      Hence fix this by generating hash key in software and
      configuring in hardware.
      
      Fixes: c54ffc73
      
       ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e7002b3b
    • Florian Kauer's avatar
      igc: Prevent garbled TX queue with XDP ZEROCOPY · 78adb4bc
      Florian Kauer authored
      In normal operation, each populated queue item has
      next_to_watch pointing to the last TX desc of the packet,
      while each cleaned item has it set to 0. In particular,
      next_to_use that points to the next (necessarily clean)
      item to use has next_to_watch set to 0.
      
      When the TX queue is used both by an application using
      AF_XDP with ZEROCOPY as well as a second non-XDP application
      generating high traffic, the queue pointers can get in
      an invalid state where next_to_use points to an item
      where next_to_watch is NOT set to 0.
      
      However, the implementation assumes at several places
      that this is never the case, so if it does hold,
      bad things happen. In particular, within the loop inside
      of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
      Finally, this prevents any further transmission via
      this queue and it never gets unblocked or signaled.
      Secondly, if the queue is in this garbled state,
      the inner loop of igc_clean_tx_ring() will never terminate,
      completely hogging a CPU core.
      
      The reason is that igc_xdp_xmit_zc() reads next_to_use
      before acquiring the lock, and writing it back
      (potentially unmodified) later. If it got modified
      before locking, the outdated next_to_use is written
      pointing to an item that was already used elsewhere
      (and thus next_to_watch got written).
      
      Fixes: 9acf59a7
      
       ("igc: Enable TX via AF_XDP zero-copy")
      Signed-off-by: default avatarFlorian Kauer <florian.kauer@linutronix.de>
      Reviewed-by: default avatarKurt Kanzenbach <kurt@linutronix.de>
      Tested-by: default avatarKurt Kanzenbach <kurt@linutronix.de>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      78adb4bc
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.5-20230717' of... · 936fd2c5
      Jakub Kicinski authored
      
      Merge tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2023-07-17
      
      The 1st patch is by Ziyang Xuan and fixes a possible memory leak in
      the receiver handling in the CAN RAW protocol.
      
      YueHaibing contributes a use after free in bcm_proc_show() of the
      Broad Cast Manager (BCM) CAN protocol.
      
      The next 2 patches are by me and fix a possible null pointer
      dereference in the RX path of the gs_usb driver with activated
      hardware timestamps and the candlelight firmware.
      
      The last patch is by Fedor Ross, Marek Vasut and me and targets the
      mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode()
      is increased to fix bus joining on busy CAN buses and very low bit
      rate.
      
      * tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout
        can: gs_usb: fix time stamp counter initialization
        can: gs_usb: gs_can_open(): improve error handling
        can: bcm: Fix UAF in bcm_proc_show()
        can: raw: fix receiver memory leak
      ====================
      
      Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      936fd2c5
    • John Fastabend's avatar
      mailmap: Add entry for old intel email · 195e903b
      John Fastabend authored
      
      
      Fix old email to avoid bouncing email from net/drivers and older
      netdev work. Anyways my @intel email hasn't been active for years.
      
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20230717173306.38407-1-john.fastabend@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      195e903b
    • Shannon Nelson's avatar
      mailmap: add entries for past lives · d1998e50
      Shannon Nelson authored
      
      
      Update old emails for my current work email.
      
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Link: https://lore.kernel.org/r/20230717193242.43670-1-shannon.nelson@amd.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1998e50
    • Jakub Kicinski's avatar
      Merge branch 'selftests-tc-increase-timeout-and-add-missing-kconfig' · 2187d6ca
      Jakub Kicinski authored
      
      
      Matthieu Baerts says:
      
      ====================
      selftests: tc: increase timeout and add missing kconfig
      
      When looking for something else in LKFT reports [1], I noticed that the
      TC selftest ended with a timeout error:
      
        not ok 1 selftests: tc-testing: tdc.sh # TIMEOUT 45 seconds
      
      I also noticed most of the tests were skipped because the "teardown
      stage" did not complete successfully. It was due to missing kconfig.
      
      These patches fix these two errors plus an extra one because this
      selftest reads info from "/proc/net/nf_conntrack". Thank you Pedro for
      having helped me fixing these issues [2].
      
      Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1]
      Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2]
      ====================
      
      Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-0-1eb4fd3a96e7@tessares.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2187d6ca
    • Matthieu Baerts's avatar
      selftests: tc: add ConnTrack procfs kconfig · 031c99e7
      Matthieu Baerts authored
      When looking at the TC selftest reports, I noticed one test was failing
      because /proc/net/nf_conntrack was not available.
      
        not ok 373 3992 - Add ct action triggering DNAT tuple conflict
        	Could not match regex pattern. Verify command output:
        cat: /proc/net/nf_conntrack: No such file or directory
      
      It is only available if NF_CONNTRACK_PROCFS kconfig is set. So the issue
      can be fixed simply by adding it to the list of required kconfig.
      
      Fixes: e4690564
      
       ("tc-testing: add test for ct DNAT tuple collision")
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [1]
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Tested-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-3-1eb4fd3a96e7@tessares.net
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      031c99e7
    • Matthieu Baerts's avatar
      selftests: tc: add 'ct' action kconfig dep · 719b4774
      Matthieu Baerts authored
      When looking for something else in LKFT reports [1], I noticed most of
      the tests were skipped because the "teardown stage" did not complete
      successfully.
      
      Pedro found out this is due to the fact CONFIG_NF_FLOW_TABLE is required
      but not listed in the 'config' file. Adding it to the list fixes the
      issues on LKFT side. CONFIG_NET_ACT_CT is now set to 'm' in the final
      kconfig.
      
      Fixes: c34b961a
      
       ("net/sched: act_ct: Create nf flow table per zone")
      Cc: stable@vger.kernel.org
      Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1]
      Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2]
      Suggested-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Tested-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-2-1eb4fd3a96e7@tessares.net
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      719b4774
    • Matthieu Baerts's avatar
      selftests: tc: set timeout to 15 minutes · fda05798
      Matthieu Baerts authored
      When looking for something else in LKFT reports [1], I noticed that the
      TC selftest ended with a timeout error:
      
        not ok 1 selftests: tc-testing: tdc.sh # TIMEOUT 45 seconds
      
      The timeout had been introduced 3 years ago, see the Fixes commit below.
      
      This timeout is only in place when executing the selftests via the
      kselftests runner scripts. I guess this is not what most TC devs are
      using and nobody noticed the issue before.
      
      The new timeout is set to 15 minutes as suggested by Pedro [2]. It looks
      like it is plenty more time than what it takes in "normal" conditions.
      
      Fixes: 852c8cbf
      
       ("selftests/kselftest/runner.sh: Add 45 second timeout per test")
      Cc: stable@vger.kernel.org
      Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1]
      Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2]
      Suggested-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reviewed-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-1-1eb4fd3a96e7@tessares.net
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fda05798
    • Alexander Duyck's avatar
      bpf, arm64: Fix BTI type used for freplace attached functions · a3f25d61
      Alexander Duyck authored
      When running an freplace attached bpf program on an arm64 system w were
      seeing the following issue:
        Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI
      
      After a bit of work to track it down I determined that what appeared to be
      happening is that the 'bti c' at the start of the program was somehow being
      reached after a 'br' instruction. Further digging pointed me toward the
      fact that the function was attached via freplace. This in turn led me to
      build_plt which I believe is invoking the long jump which is triggering
      this error.
      
      To resolve it we can replace the 'bti c' with 'bti jc' and add a comment
      explaining why this has to be modified as such.
      
      Fixes: b2ad54e1
      
       ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
      Signed-off-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: default avatarXu Kuohai <xukuohai@huawei.com>
      Link: https://lore.kernel.org/r/168926677665.316237.9953845318337455525.stgit@ahduyck-xeon-server.home.arpa
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a3f25d61
    • Alexei Starovoitov's avatar
      Merge branch 'two-more-fixes-for-check_max_stack_depth' · a8237cc8
      Alexei Starovoitov authored
      
      
      Kumar Kartikeya Dwivedi says:
      
      ====================
      Two more fixes for check_max_stack_depth
      
      I noticed two more bugs while reviewing the code, description and
      examples available in the patches.
      
      One leads to incorrect subprog index to be stored in the frame stack
      maintained by the function (leading to incorrect tail_call_reachable
      marks, among other things).
      
      The other problem is missing exploration pass of other async callbacks
      when they are not called from the main prog. Call chains rooted at them
      can thus bypass the stack limits (32 call frames * max permitted stack
      depth per function).
      
      Changelog:
      ----------
      v1 -> v2
      v1: https://lore.kernel.org/bpf/20230713003118.1327943-1-memxor@gmail.com
      
       * Fix commit message for patch 2 (Alexei)
      ====================
      
      Link: https://lore.kernel.org/r/20230717161530.1238-1-memxor@gmail.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a8237cc8
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add more tests for check_max_stack_depth bug · 824adae4
      Kumar Kartikeya Dwivedi authored
      
      
      Another test which now exercies the path of the verifier where it will
      explore call chains rooted at the async callback. Without the prior
      fixes, this program loads successfully, which is incorrect.
      
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20230717161530.1238-4-memxor@gmail.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      824adae4
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Repeat check_max_stack_depth for async callbacks · b5e9ad52
      Kumar Kartikeya Dwivedi authored
      While the check_max_stack_depth function explores call chains emanating
      from the main prog, which is typically enough to cover all possible call
      chains, it doesn't explore those rooted at async callbacks unless the
      async callback will have been directly called, since unlike non-async
      callbacks it skips their instruction exploration as they don't
      contribute to stack depth.
      
      It could be the case that the async callback leads to a callchain which
      exceeds the stack depth, but this is never reachable while only
      exploring the entry point from main subprog. Hence, repeat the check for
      the main subprog *and* all async callbacks marked by the symbolic
      execution pass of the verifier, as execution of the program may begin at
      any of them.
      
      Consider functions with following stack depths:
      main: 256
      async: 256
      foo: 256
      
      main:
          rX = async
          bpf_timer_set_callback(...)
      
      async:
          foo()
      
      Here, async is not descended as it does not contribute to stack depth of
      main (since it is referenced using bpf_pseudo_func and not
      bpf_pseudo_call). However, when async is invoked asynchronously, it will
      end up breaching the MAX_BPF_STACK limit by calling foo.
      
      Hence, in addition to main, we also need to explore call chains
      beginning at all async callback subprogs in a program.
      
      Fixes: 7ddc80a4
      
       ("bpf: Teach stack depth check about async callbacks.")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b5e9ad52
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix subprog idx logic in check_max_stack_depth · ba7b3e7d
      Kumar Kartikeya Dwivedi authored
      The assignment to idx in check_max_stack_depth happens once we see a
      bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of
      the code performs a few checks and then pushes the frame to the frame
      stack, except the case of async callbacks. If the async callback case
      causes the loop iteration to be skipped, the idx assignment will be
      incorrect on the next iteration of the loop. The value stored in the
      frame stack (as the subprogno of the current subprog) will be incorrect.
      
      This leads to incorrect checks and incorrect tail_call_reachable
      marking. Save the target subprog in a new variable and only assign to
      idx once we are done with the is_async_cb check which may skip pushing
      of frame to the frame stack and subsequent stack depth checks and tail
      call markings.
      
      Fixes: 7ddc80a4
      
       ("bpf: Teach stack depth check about async callbacks.")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ba7b3e7d
  3. Jul 18, 2023
    • Geetha sowjanya's avatar
      octeontx2-pf: Dont allocate BPIDs for LBK interfaces · 8fcd7c7b
      Geetha sowjanya authored
      Current driver enables backpressure for LBK interfaces.
      But these interfaces do not support this feature.
      Hence, this patch fixes the issue by skipping the
      backpressure configuration for these interfaces.
      
      Fixes: 75f36270
      
       ("octeontx2-pf: Support to enable/disable pause frames via ethtool").
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8fcd7c7b
    • Ido Schimmel's avatar
      vrf: Fix lockdep splat in output path · 2033ab90
      Ido Schimmel authored
      Cited commit converted the neighbour code to use the standard RCU
      variant instead of the RCU-bh variant, but the VRF code still uses
      rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup
      code in its IPv4 and IPv6 output paths, resulting in lockdep splats
      [1][2]. Can be reproduced using [3].
      
      Fix by switching to rcu_read_lock() / rcu_read_unlock().
      
      [1]
      =============================
      WARNING: suspicious RCU usage
      6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted
      -----------------------------
      include/net/neighbour.h:302 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by ping/183:
       #0: ffff888105ea1d80 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xc6c/0x33c0
       #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output+0x2e3/0x2030
      
      stack backtrace:
      CPU: 0 PID: 183 Comm: ping Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0xc1/0xf0
       lockdep_rcu_suspicious+0x211/0x3b0
       vrf_output+0x1380/0x2030
       ip_push_pending_frames+0x125/0x2a0
       raw_sendmsg+0x200d/0x33c0
       inet_sendmsg+0xa2/0xe0
       __sys_sendto+0x2aa/0x420
       __x64_sys_sendto+0xe5/0x1c0
       do_syscall_64+0x38/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      [2]
      =============================
      WARNING: suspicious RCU usage
      6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted
      -----------------------------
      include/net/neighbour.h:302 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by ping6/182:
       #0: ffff888114b63000 (sk_lock-AF_INET6){+.+.}-{0:0}, at: rawv6_sendmsg+0x1602/0x3e50
       #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output6+0xe9/0x1310
      
      stack backtrace:
      CPU: 0 PID: 182 Comm: ping6 Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0xc1/0xf0
       lockdep_rcu_suspicious+0x211/0x3b0
       vrf_output6+0xd32/0x1310
       ip6_local_out+0xb4/0x1a0
       ip6_send_skb+0xbc/0x340
       ip6_push_pending_frames+0xe5/0x110
       rawv6_sendmsg+0x2e6e/0x3e50
       inet_sendmsg+0xa2/0xe0
       __sys_sendto+0x2aa/0x420
       __x64_sys_sendto+0xe5/0x1c0
       do_syscall_64+0x38/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      [3]
      #!/bin/bash
      
      ip link add name vrf-red up numtxqueues 2 type vrf table 10
      ip link add name swp1 up master vrf-red type dummy
      ip address add 192.0.2.1/24 dev swp1
      ip address add 2001:db8:1::1/64 dev swp1
      ip neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud perm dev swp1
      ip neigh add 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud perm dev swp1
      ip vrf exec vrf-red ping 192.0.2.2 -c 1 &> /dev/null
      ip vrf exec vrf-red ping6 2001:db8:1::2 -c 1 &> /dev/null
      
      Fixes: 09eed119
      
       ("neighbour: switch to standard rcu, instead of rcu_bh")
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Link: https://lore.kernel.org/netdev/CA+G9fYtEr-=GbcXNDYo3XOkwR+uYgehVoDjsP0pFLUpZ_AZcyg@mail.gmail.com/
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230715153605.4068066-1-idosch@nvidia.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2033ab90
    • Paolo Abeni's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 03803083
      Paolo Abeni authored
      
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-14 (ice)
      
      This series contains updates to ice driver only.
      
      Petr Oros removes multiple calls made to unregister netdev and
      devlink_port.
      
      Michal fixes null pointer dereference that can occur during reload.
      ====================
      
      Link: https://lore.kernel.org/r/20230714201041.1717834-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      03803083
    • Fedor Ross's avatar
      can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout · 9efa1a54
      Fedor Ross authored
      The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
      mode' or . The maximum length of a CAN frame is 736 bits (64 data
      bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
      spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
      MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
      
      Otherwise during polling for the CAN controller to enter 'Normal CAN
      2.0 mode' the timeout limit is exceeded and the configuration fails
      with:
      
      | $ ip link set dev can1 up type can bitrate 10000
      | [  731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
      | [  731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
      | [  731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
      | RTNETLINK answers: Connection timed out
      
      Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
      maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
      EFF mode, worst case bit stuffing and interframe spacing at the
      current bit rate.
      
      For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
      that holds the max frame length in bits, which is 736. This can be
      replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
      a cleanup patch later.
      
      Fixes: 55e5b97f
      
       ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
      Signed-off-by: default avatarFedor Ross <fedor.ross@ifm.com>
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v5-1-06600f34c684@pengutronix.de
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      9efa1a54