Skip to content
  1. Nov 07, 2022
    • Alexander Potapenko's avatar
      ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network · c23fb2c8
      Alexander Potapenko authored
      
      
      When copying a `struct ifaddrlblmsg` to the network, __ifal_reserved
      remained uninitialized, resulting in a 1-byte infoleak:
      
        BUG: KMSAN: kernel-network-infoleak in __netdev_start_xmit ./include/linux/netdevice.h:4841
         __netdev_start_xmit ./include/linux/netdevice.h:4841
         netdev_start_xmit ./include/linux/netdevice.h:4857
         xmit_one net/core/dev.c:3590
         dev_hard_start_xmit+0x1dc/0x800 net/core/dev.c:3606
         __dev_queue_xmit+0x17e8/0x4350 net/core/dev.c:4256
         dev_queue_xmit ./include/linux/netdevice.h:3009
         __netlink_deliver_tap_skb net/netlink/af_netlink.c:307
         __netlink_deliver_tap+0x728/0xad0 net/netlink/af_netlink.c:325
         netlink_deliver_tap net/netlink/af_netlink.c:338
         __netlink_sendskb net/netlink/af_netlink.c:1263
         netlink_sendskb+0x1d9/0x200 net/netlink/af_netlink.c:1272
         netlink_unicast+0x56d/0xf50 net/netlink/af_netlink.c:1360
         nlmsg_unicast ./include/net/netlink.h:1061
         rtnl_unicast+0x5a/0x80 net/core/rtnetlink.c:758
         ip6addrlbl_get+0xfad/0x10f0 net/ipv6/addrlabel.c:628
         rtnetlink_rcv_msg+0xb33/0x1570 net/core/rtnetlink.c:6082
        ...
        Uninit was created at:
         slab_post_alloc_hook+0x118/0xb00 mm/slab.h:742
         slab_alloc_node mm/slub.c:3398
         __kmem_cache_alloc_node+0x4f2/0x930 mm/slub.c:3437
         __do_kmalloc_node mm/slab_common.c:954
         __kmalloc_node_track_caller+0x117/0x3d0 mm/slab_common.c:975
         kmalloc_reserve net/core/skbuff.c:437
         __alloc_skb+0x27a/0xab0 net/core/skbuff.c:509
         alloc_skb ./include/linux/skbuff.h:1267
         nlmsg_new ./include/net/netlink.h:964
         ip6addrlbl_get+0x490/0x10f0 net/ipv6/addrlabel.c:608
         rtnetlink_rcv_msg+0xb33/0x1570 net/core/rtnetlink.c:6082
         netlink_rcv_skb+0x299/0x550 net/netlink/af_netlink.c:2540
         rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6109
         netlink_unicast_kernel net/netlink/af_netlink.c:1319
         netlink_unicast+0x9ab/0xf50 net/netlink/af_netlink.c:1345
         netlink_sendmsg+0xebc/0x10f0 net/netlink/af_netlink.c:1921
        ...
      
      This patch ensures that the reserved field is always initialized.
      
      Reported-by: default avatar <syzbot+3553517af6020c4f2813f1003fe76ef3cbffe98d@syzkaller.appspotmail.com>
      Fixes: 2a8cc6c8
      
       ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c23fb2c8
    • Lu Wei's avatar
      tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent · 0c175da7
      Lu Wei authored
      If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
      of TCPOPT_SACK_PERM is called to enable sack after data is sent
      and dupacks are received , it will trigger a warning in function
      tcp_verify_left_out() as follows:
      
      ============================================
      WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
      tcp_timeout_mark_lost+0x154/0x160
      tcp_enter_loss+0x2b/0x290
      tcp_retransmit_timer+0x50b/0x640
      tcp_write_timer_handler+0x1c8/0x340
      tcp_write_timer+0xe5/0x140
      call_timer_fn+0x3a/0x1b0
      __run_timers.part.0+0x1bf/0x2d0
      run_timer_softirq+0x43/0xb0
      __do_softirq+0xfd/0x373
      __irq_exit_rcu+0xf6/0x140
      
      The warning is caused in the following steps:
      1. a socket named socketA is created
      2. socketA enters repair mode without build a connection
      3. socketA calls connect() and its state is changed to TCP_ESTABLISHED
         directly
      4. socketA leaves repair mode
      5. socketA calls sendmsg() to send data, packets_out and sack_outs(dup
         ack receives) increase
      6. socketA enters repair mode again
      7. socketA calls setsockopt with TCPOPT_SACK_PERM to enable sack
      8. retransmit timer expires, it calls tcp_timeout_mark_lost(), lost_out
         increases
      9. sack_outs + lost_out > packets_out triggers since lost_out and
         sack_outs increase repeatly
      
      In function tcp_timeout_mark_lost(), tp->sacked_out will be cleared if
      Step7 not happen and the warning will not be triggered. As suggested by
      Denis and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was
      already sent.
      
      socket-tcp tests in CRIU has been tested as follows:
      $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
             --ignore-taint
      
      socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
      
      Fixes: b139ba4e
      
       ("tcp: Repair connection-time negotiated parameters")
      Signed-off-by: default avatarLu Wei <luwei32@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c175da7
    • Zhaoping Shu's avatar
      net: wwan: iosm: Remove unnecessary if_mutex lock · f9027f88
      Zhaoping Shu authored
      
      
      These WWAN network interface operations (create/delete/open/close)
      are already protected by RTNL lock, i.e., wwan_ops.newlink(),
      wwan_ops.dellink(), net_device_ops.ndo_open() and
      net_device.ndo_stop() are called with RTNL lock held.
      Therefore, this patch removes the unnecessary if_mutex.
      
      Signed-off-by: default avatarZhaoping Shu <zhaoping.shu@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9027f88
    • HW He's avatar
      net: wwan: mhi: fix memory leak in mhi_mbim_dellink · 668205b9
      HW He authored
      MHI driver registers network device without setting the
      needs_free_netdev flag, and does NOT call free_netdev() when
      unregisters network device, which causes a memory leak.
      
      This patch sets needs_free_netdev to true when registers
      network device, which makes netdev subsystem call free_netdev()
      automatically after unregister_netdevice().
      
      Fixes: aa730a99
      
       ("net: wwan: Add MHI MBIM network driver")
      Signed-off-by: default avatarHW He <hw.he@mediatek.com>
      Signed-off-by: default avatarZhaoping Shu <zhaoping.shu@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      668205b9
    • HW He's avatar
      net: wwan: iosm: fix memory leak in ipc_wwan_dellink · f25caaca
      HW He authored
      IOSM driver registers network device without setting the
      needs_free_netdev flag, and does NOT call free_netdev() when
      unregisters network device, which causes a memory leak.
      
      This patch sets needs_free_netdev to true when registers
      network device, which makes netdev subsystem call free_netdev()
      automatically after unregister_netdevice().
      
      Fixes: 2a54f2c7
      
       ("net: iosm: net driver")
      Signed-off-by: default avatarHW He <hw.he@mediatek.com>
      Reviewed-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Signed-off-by: default avatarZhaoping Shu <zhaoping.shu@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f25caaca
    • Zhengchao Shao's avatar
      hamradio: fix issue of dev reference count leakage in bpq_device_event() · 85cbaf03
      Zhengchao Shao authored
      When following tests are performed, it will cause dev reference counting
      leakage.
      a)ip link add bond2 type bond mode balance-rr
      b)ip link set bond2 up
      c)ifenslave -f bond2 rose1
      d)ip link del bond2
      
      When new bond device is created, the default type of the bond device is
      ether. And the bond device is up, bpq_device_event() receives the message
      and creates a new bpq device. In this case, the reference count value of
      dev is hold once. But after "ifenslave -f bond2 rose1" command is
      executed, the type of the bond device is changed to rose. When the bond
      device is unregistered, bpq_device_event() will not put the dev reference
      count.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85cbaf03
    • Zhengchao Shao's avatar
      net: lapbether: fix issue of dev reference count leakage in lapbeth_device_event() · 531705a7
      Zhengchao Shao authored
      When following tests are performed, it will cause dev reference counting
      leakage.
      a)ip link add bond2 type bond mode balance-rr
      b)ip link set bond2 up
      c)ifenslave -f bond2 rose1
      d)ip link del bond2
      
      When new bond device is created, the default type of the bond device is
      ether. And the bond device is up, lapbeth_device_event() receives the
      message and creates a new lapbeth device. In this case, the reference
      count value of dev is hold once. But after "ifenslave -f bond2 rose1"
      command is executed, the type of the bond device is changed to rose. When
      the bond device is unregistered, lapbeth_device_event() will not put the
      dev reference count.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      531705a7
  2. Nov 05, 2022
    • Sean Anderson's avatar
      net: fman: Unregister ethernet device on removal · b7cbc674
      Sean Anderson authored
      When the mac device gets removed, it leaves behind the ethernet device.
      This will result in a segfault next time the ethernet device accesses
      mac_dev. Remove the ethernet device when we get removed to prevent
      this. This is not completely reversible, since some resources aren't
      cleaned up properly, but that can be addressed later.
      
      Fixes: 39339616
      
       ("fsl/fman: Add FMan MAC driver")
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Link: https://lore.kernel.org/r/20221103182831.2248833-1-sean.anderson@seco.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b7cbc674
    • Jakub Kicinski's avatar
      Merge branch 'bnxt_en-bug-fixes' · c3812c95
      Jakub Kicinski authored
      
      
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes
      
      This bug fix series includes fixes for PCIE AER, a crash that may occur
      when doing ethtool -C in the middle of error recovery, and aRFS.
      ====================
      
      Link: https://lore.kernel.org/r/1667518407-15761-1-git-send-email-michael.chan@broadcom.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c3812c95
    • Alex Barba's avatar
      bnxt_en: fix potentially incorrect return value for ndo_rx_flow_steer · 02597d39
      Alex Barba authored
      In the bnxt_en driver ndo_rx_flow_steer returns '0' whenever an entry
      that we are attempting to steer is already found.  This is not the
      correct behavior.  The return code should be the value/index that
      corresponds to the entry.  Returning zero all the time causes the
      RFS records to be incorrect unless entry '0' is the correct one.  As
      flows migrate to different cores this can create entries that are not
      correct.
      
      Fixes: c0c050c5
      
       ("bnxt_en: New Broadcom ethernet driver.")
      Reported-by: default avatarAkshay Navgire <anavgire@purestorage.com>
      Signed-off-by: default avatarAlex Barba <alex.barba@broadcom.com>
      Signed-off-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      02597d39
    • Michael Chan's avatar
      bnxt_en: Fix possible crash in bnxt_hwrm_set_coal() · 6d81ea37
      Michael Chan authored
      During the error recovery sequence, the rtnl_lock is not held for the
      entire duration and some datastructures may be freed during the sequence.
      Check for the BNXT_STATE_OPEN flag instead of netif_running() to ensure
      that the device is fully operational before proceeding to reconfigure
      the coalescing settings.
      
      This will fix a possible crash like this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP NOPTI
      CPU: 10 PID: 181276 Comm: ethtool Kdump: loaded Tainted: G          IOE    --------- -  - 4.18.0-348.el8.x86_64 #1
      Hardware name: Dell Inc. PowerEdge R740/0F9N89, BIOS 2.3.10 08/15/2019
      RIP: 0010:bnxt_hwrm_set_coal+0x1fb/0x2a0 [bnxt_en]
      Code: c2 66 83 4e 22 08 66 89 46 1c e8 10 cb 00 00 41 83 c6 01 44 39 b3 68 01 00 00 0f 8e a3 00 00 00 48 8b 93 c8 00 00 00 49 63 c6 <48> 8b 2c c2 48 8b 85 b8 02 00 00 48 85 c0 74 2e 48 8b 74 24 08 f6
      RSP: 0018:ffffb11c8dcaba50 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8d168a8b0ac0 RCX: 00000000000000c5
      RDX: 0000000000000000 RSI: ffff8d162f72c000 RDI: ffff8d168a8b0b28
      RBP: 0000000000000000 R08: b6e1f68a12e9a7eb R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000037 R12: ffff8d168a8b109c
      R13: ffff8d168a8b10aa R14: 0000000000000000 R15: ffffffffc01ac4e0
      FS:  00007f3852e4c740(0000) GS:ffff8d24c0080000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000041b3ee003 CR4: 00000000007706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       ethnl_set_coalesce+0x3ce/0x4c0
       genl_family_rcv_msg_doit.isra.15+0x10f/0x150
       genl_family_rcv_msg+0xb3/0x160
       ? coalesce_fill_reply+0x480/0x480
       genl_rcv_msg+0x47/0x90
       ? genl_family_rcv_msg+0x160/0x160
       netlink_rcv_skb+0x4c/0x120
       genl_rcv+0x24/0x40
       netlink_unicast+0x196/0x230
       netlink_sendmsg+0x204/0x3d0
       sock_sendmsg+0x4c/0x50
       __sys_sendto+0xee/0x160
       ? syscall_trace_enter+0x1d3/0x2c0
       ? __audit_syscall_exit+0x249/0x2a0
       __x64_sys_sendto+0x24/0x30
       do_syscall_64+0x5b/0x1a0
       entry_SYSCALL_64_after_hwframe+0x65/0xca
      RIP: 0033:0x7f38524163bb
      
      Fixes: 2151fe08
      
       ("bnxt_en: Handle RESET_NOTIFY async event from firmware.")
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6d81ea37
    • Vikas Gupta's avatar
      bnxt_en: fix the handling of PCIE-AER · 0cf736a1
      Vikas Gupta authored
      Fix the sequence required for PCIE-AER. While slot reset occurs, firmware
      might not be ready and the driver needs to check for its recovery.  We
      also need to remap the health registers for some chips and clear the
      resource reservations.  The resources will be allocated again during
      bnxt_io_resume().
      
      Fixes: fb1e6e56
      
       ("bnxt_en: Fix AER recovery.")
      Signed-off-by: default avatarVikas Gupta <vikas.gupta@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0cf736a1
    • Vikas Gupta's avatar
      bnxt_en: refactor bnxt_cancel_reservations() · b4c66425
      Vikas Gupta authored
      
      
      Introduce bnxt_clear_reservations() to clear the reserved attributes only.
      This will be used in the next patch to fix PCI AER handling.
      
      Signed-off-by: default avatarVikas Gupta <vikas.gupta@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b4c66425
  3. Nov 04, 2022
    • Wang Yufen's avatar
      net: tun: Fix memory leaks of napi_get_frags · 1118b204
      Wang Yufen authored
      kmemleak reports after running test_progs:
      
      unreferenced object 0xffff8881b1672dc0 (size 232):
        comm "test_progs", pid 394388, jiffies 4354712116 (age 841.975s)
        hex dump (first 32 bytes):
          e0 84 d7 a8 81 88 ff ff 80 2c 67 b1 81 88 ff ff  .........,g.....
          00 40 c5 9b 81 88 ff ff 00 00 00 00 00 00 00 00  .@..............
        backtrace:
          [<00000000c8f01748>] napi_skb_cache_get+0xd4/0x150
          [<0000000041c7fc09>] __napi_build_skb+0x15/0x50
          [<00000000431c7079>] __napi_alloc_skb+0x26e/0x540
          [<000000003ecfa30e>] napi_get_frags+0x59/0x140
          [<0000000099b2199e>] tun_get_user+0x183d/0x3bb0 [tun]
          [<000000008a5adef0>] tun_chr_write_iter+0xc0/0x1b1 [tun]
          [<0000000049993ff4>] do_iter_readv_writev+0x19f/0x320
          [<000000008f338ea2>] do_iter_write+0x135/0x630
          [<000000008a3377a4>] vfs_writev+0x12e/0x440
          [<00000000a6b5639a>] do_writev+0x104/0x280
          [<00000000ccf065d8>] do_syscall_64+0x3b/0x90
          [<00000000d776e329>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The issue occurs in the following scenarios:
      tun_get_user()
        napi_gro_frags()
          napi_frags_finish()
            case GRO_NORMAL:
              gro_normal_one()
                list_add_tail(&skb->list, &napi->rx_list);
                <-- While napi->rx_count < READ_ONCE(gro_normal_batch),
                <-- gro_normal_list() is not called, napi->rx_list is not empty
        <-- not ask to complete the gro work, will cause memory leaks in
        <-- following tun_napi_del()
      ...
      tun_napi_del()
        netif_napi_del()
          __netif_napi_del()
          <-- &napi->rx_list is not empty, which caused memory leaks
      
      To fix, add napi_complete() after napi_gro_frags().
      
      Fixes: 90e33d45
      
       ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: default avatarWang Yufen <wangyufen@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1118b204
    • Ratheesh Kannoth's avatar
      octeontx2-pf: NIX TX overwrites SQ_CTX_HW_S[SQ_INT] · 51afe902
      Ratheesh Kannoth authored
      In scenarios where multiple errors have occurred
      for a SQ before SW starts handling error interrupt,
      SQ_CTX[OP_INT] may get overwritten leading to
      NIX_LF_SQ_OP_INT returning incorrect value.
      To workaround this read LMT, MNQ and SQ individual
      error status registers to determine the cause of error.
      
      Fixes: 4ff7d148
      
       ("octeontx2-pf: Error handling support")
      Signed-off-by: default avatarRatheesh Kannoth <rkannoth@marvell.com>
      Reviewed-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51afe902
    • Roger Quadros's avatar
      net: ethernet: ti: am65-cpsw: Fix segmentation fault at module unload · 1a0c016a
      Roger Quadros authored
      Move am65_cpsw_nuss_phylink_cleanup() call to after
      am65_cpsw_nuss_cleanup_ndev() so phylink is still valid
      to prevent the below Segmentation fault on module remove when
      first slave link is up.
      
      [   31.652944] Unable to handle kernel paging request at virtual address 00040008000005f4
      [   31.684627] Mem abort info:
      [   31.687446]   ESR = 0x0000000096000004
      [   31.704614]   EC = 0x25: DABT (current EL), IL = 32 bits
      [   31.720663]   SET = 0, FnV = 0
      [   31.723729]   EA = 0, S1PTW = 0
      [   31.740617]   FSC = 0x04: level 0 translation fault
      [   31.756624] Data abort info:
      [   31.759508]   ISV = 0, ISS = 0x00000004
      [   31.776705]   CM = 0, WnR = 0
      [   31.779695] [00040008000005f4] address between user and kernel address ranges
      [   31.808644] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
      [   31.814928] Modules linked in: wlcore_sdio wl18xx wlcore mac80211 libarc4 cfg80211 rfkill crct10dif_ce phy_gmii_sel ti_am65_cpsw_nuss(-) sch_fq_codel ipv6
      [   31.828776] CPU: 0 PID: 1026 Comm: modprobe Not tainted 6.1.0-rc2-00012-gfabfcf7dafdb-dirty #160
      [   31.837547] Hardware name: Texas Instruments AM625 (DT)
      [   31.842760] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   31.849709] pc : phy_stop+0x18/0xf8
      [   31.853202] lr : phylink_stop+0x38/0xf8
      [   31.857031] sp : ffff80000a0839f0
      [   31.860335] x29: ffff80000a0839f0 x28: ffff000000de1c80 x27: 0000000000000000
      [   31.867462] x26: 0000000000000000 x25: 0000000000000000 x24: ffff80000a083b98
      [   31.874589] x23: 0000000000000800 x22: 0000000000000001 x21: ffff000001bfba90
      [   31.881715] x20: ffff0000015ee000 x19: 0004000800000200 x18: 0000000000000000
      [   31.888842] x17: ffff800076c45000 x16: ffff800008004000 x15: 000058e39660b106
      [   31.895969] x14: 0000000000000144 x13: 0000000000000144 x12: 0000000000000000
      [   31.903095] x11: 000000000000275f x10: 00000000000009e0 x9 : ffff80000a0837d0
      [   31.910222] x8 : ffff000000de26c0 x7 : ffff00007fbd6540 x6 : ffff00007fbd64c0
      [   31.917349] x5 : ffff00007fbd0b10 x4 : ffff00007fbd0b10 x3 : ffff00007fbd3920
      [   31.924476] x2 : d0a07fcff8b8d500 x1 : 0000000000000000 x0 : 0004000800000200
      [   31.931603] Call trace:
      [   31.934042]  phy_stop+0x18/0xf8
      [   31.937177]  phylink_stop+0x38/0xf8
      [   31.940657]  am65_cpsw_nuss_ndo_slave_stop+0x28/0x1e0 [ti_am65_cpsw_nuss]
      [   31.947452]  __dev_close_many+0xa4/0x140
      [   31.951371]  dev_close_many+0x84/0x128
      [   31.955115]  unregister_netdevice_many+0x130/0x6d0
      [   31.959897]  unregister_netdevice_queue+0x94/0xd8
      [   31.964591]  unregister_netdev+0x24/0x38
      [   31.968504]  am65_cpsw_nuss_cleanup_ndev.isra.0+0x48/0x70 [ti_am65_cpsw_nuss]
      [   31.975637]  am65_cpsw_nuss_remove+0x58/0xf8 [ti_am65_cpsw_nuss]
      
      Cc: <Stable@vger.kernel.org> # v5.18+
      Fixes: e8609e69
      
       ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK")
      Signed-off-by: default avatarRoger Quadros <rogerq@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a0c016a
    • David S. Miller's avatar
      Merge branch 'macsec-offload-fixes' · f4e405f5
      David S. Miller authored
      Sabrina Dubroca says:
      
      ====================
      macsec: offload-related fixes
      
      I'm working on a dummy offload for macsec on netdevsim. It just has a
      small SecY and RXSC table so I can trigger failures easily on the
      ndo_* side. It has exposed a couple of issues.
      
      The first patch is a revert of commit c850240b ("net: macsec:
      report real_dev features when HW offloading is enabled"). That commit
      tried to improve the performance of macsec offload by taking advantage
      of some of the NIC's features, but in doing so, broke macsec offload
      when the lower device supports both macsec and ipsec offload, as the
      ipsec offload feature flags were copied from the real device. Since
      the macsec device doesn't provide xdo_* ops, the XFRM core rejects the
      registration of the new macsec device in xfrm_api_check.
      
      I'm working on re-adding those feature flags when offload is
      available, but I haven't fully solved that yet. I think it would be
      safer to do that second part in net-next considering how complex
      feature interactions tend to be.
      
      v2:
       - better describe the issue introduced by commit c850240b
      
       (Leon
         Romanovsky)
       - patch #3: drop unnecessary !! (Leon Romanovsky)
      
      v3:
       - patch #3: drop extra newline (Jakub Kicinski)
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4e405f5
    • Sabrina Dubroca's avatar
      macsec: clear encryption keys from the stack after setting up offload · aaab73f8
      Sabrina Dubroca authored
      macsec_add_rxsa and macsec_add_txsa copy the key to an on-stack
      offloading context to pass it to the drivers, but leaves it there when
      it's done. Clear it with memzero_explicit as soon as it's not needed
      anymore.
      
      Fixes: 3cf3227a
      
       ("net: macsec: hardware offloading infrastructure")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aaab73f8
    • Sabrina Dubroca's avatar
      macsec: fix detection of RXSCs when toggling offloading · 80df4706
      Sabrina Dubroca authored
      macsec_is_configured incorrectly uses secy->n_rx_sc to check if some
      RXSCs exist. secy->n_rx_sc only counts the number of active RXSCs, but
      there can also be inactive SCs as well, which may be stored in the
      driver (in case we're disabling offloading), or would have to be
      pushed to the device (in case we're trying to enable offloading).
      
      As long as RXSCs active on creation and never turned off, the issue is
      not visible.
      
      Fixes: dcb780fb
      
       ("net: macsec: add nla support for changing the offloading selection")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80df4706
    • Sabrina Dubroca's avatar
      macsec: fix secy->n_rx_sc accounting · 73a4b31c
      Sabrina Dubroca authored
      secy->n_rx_sc is supposed to be the number of _active_ rxsc's within a
      secy. This is then used by macsec_send_sci to help decide if we should
      add the SCI to the header or not.
      
      This logic is currently broken when we create a new RXSC and turn it
      off at creation, as create_rx_sc always sets ->active to true (and
      immediately uses that to increment n_rx_sc), and only later
      macsec_add_rxsc sets rx_sc->active.
      
      Fixes: c09440f7
      
       ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73a4b31c
    • Sabrina Dubroca's avatar
      macsec: delete new rxsc when offload fails · 93a30947
      Sabrina Dubroca authored
      Currently we get an inconsistent state:
       - netlink returns the error to userspace
       - the RXSC is installed but not offloaded
      
      Then the device could get confused when we try to add an RXSA, because
      the RXSC isn't supposed to exist.
      
      Fixes: 3cf3227a
      
       ("net: macsec: hardware offloading infrastructure")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93a30947
    • Sabrina Dubroca's avatar
      Revert "net: macsec: report real_dev features when HW offloading is enabled" · 8bcd560a
      Sabrina Dubroca authored
      This reverts commit c850240b
      
      .
      
      That commit tried to improve the performance of macsec offload by
      taking advantage of some of the NIC's features, but in doing so, broke
      macsec offload when the lower device supports both macsec and ipsec
      offload, as the ipsec offload feature flags (mainly NETIF_F_HW_ESP)
      were copied from the real device. Since the macsec device doesn't
      provide xdo_* ops, the XFRM core rejects the registration of the new
      macsec device in xfrm_api_check.
      
      Example perf trace when running
        ip link add link eni1np1 type macsec port 4 offload mac
      
          ip   737 [003]   795.477676: probe:xfrm_dev_event__REGISTER      name="macsec0" features=0x1c000080014869
                    xfrm_dev_event+0x3a
                    notifier_call_chain+0x47
                    register_netdevice+0x846
                    macsec_newlink+0x25a
      
          ip   737 [003]   795.477687:   probe:xfrm_dev_event__return      ret=0x8002 (NOTIFY_BAD)
                   notifier_call_chain+0x47
                   register_netdevice+0x846
                   macsec_newlink+0x25a
      
      dev->features includes NETIF_F_HW_ESP (0x04000000000000), so
      xfrm_api_check returns NOTIFY_BAD because we don't have
      dev->xfrmdev_ops on the macsec device.
      
      We could probably propagate GSO and a few other features from the
      lower device, similar to macvlan. This will be done in a future patch.
      
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bcd560a
    • Adrien Thierry's avatar
      selftests/net: give more time to udpgro bg processes to complete startup · cdb525ca
      Adrien Thierry authored
      
      
      In some conditions, background processes in udpgro don't have enough
      time to set up the sockets. When foreground processes start, this
      results in the test failing with "./udpgso_bench_tx: sendmsg: Connection
      refused". For instance, this happens from time to time on a Qualcomm
      SA8540P SoC running CentOS Stream 9.
      
      To fix this, increase the time given to background processes to
      complete the startup before foreground processes start.
      
      Signed-off-by: default avatarAdrien Thierry <athierry@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdb525ca
    • Guangbin Huang's avatar
      net: hns3: fix get wrong value of function hclge_get_dscp_prio() · cfdcb075
      Guangbin Huang authored
      As the argument struct hnae3_handle *h of function hclge_get_dscp_prio()
      can be other client registered in hnae3 layer, we need to transform it
      into hnae3_handle of local nic client to get right dscp settings for
      other clients.
      
      Fixes: dfea275e
      
       ("net: hns3: optimize converting dscp to priority process of hns3_nic_select_queue()")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfdcb075
    • Randy Dunlap's avatar
      net: octeontx2-pf: mcs: consider MACSEC setting · 4581dd48
      Randy Dunlap authored
      
      
      Fix build errors when MACSEC=m and OCTEONTX2_PF=y by having
      OCTEONTX2_PF depend on MACSEC if it is enabled. By adding
      "|| !MACSEC", this means that MACSEC is not required -- it can
      be disabled for this driver.
      
      drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.o: in function `otx2_remove':
      ../drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c:(.text+0x2fd0): undefined reference to `cn10k_mcs_free'
      mips64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.o: in function `otx2_mbox_up_handler_mcs_intr_notify':
      ../drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c:(.text+0x4610): undefined reference to `cn10k_handle_mcs_event'
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: c54ffc73
      
       ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Subbaraya Sundeep <sbhatta@marvell.com>
      Cc: Sunil Goutham <sgoutham@marvell.com>
      Cc: Geetha sowjanya <gakula@marvell.com>
      Cc: hariprasad <hkelam@marvell.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4581dd48
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2022-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 91018bbc
      Jakub Kicinski authored
      
      
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.1
      
      Second set of fixes for v6.1. Some fixes to char type usage in
      drivers, memory leaks in the stack and also functionality fixes. The
      rt2x00 char type fix is a larger (but still simple) commit, otherwise
      the fixes are small in size.
      
      * tag 'wireless-2022-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: ath11k: avoid deadlock during regulatory update in ath11k_regd_update()
        wifi: ath11k: Fix QCN9074 firmware boot on x86
        wifi: mac80211: Set TWT Information Frame Disabled bit as 1
        wifi: mac80211: Fix ack frame idr leak when mesh has no route
        wifi: mac80211: fix general-protection-fault in ieee80211_subif_start_xmit()
        wifi: brcmfmac: Fix potential buffer overflow in brcmf_fweh_event_worker()
        wifi: airo: do not assign -1 to unsigned char
        wifi: mac80211_hwsim: fix debugfs attribute ps with rc table support
        wifi: cfg80211: Fix bitrates overflow issue
        wifi: cfg80211: fix memory leak in query_regdb_file()
        wifi: mac80211: fix memory free error when registering wiphy fail
        wifi: cfg80211: silence a sparse RCU warning
        wifi: rt2x00: use explicitly signed or unsigned types
      ====================
      
      Link: https://lore.kernel.org/r/20221103125315.04E57C433C1@smtp.kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      91018bbc
    • Jiri Benc's avatar
      net: gso: fix panic on frag_list with mixed head alloc types · 9e4b7a99
      Jiri Benc authored
      Since commit 3dcbdb13 ("net: gso: Fix skb_segment splat when
      splitting gso_size mangled skb having linear-headed frag_list"), it is
      allowed to change gso_size of a GRO packet. However, that commit assumes
      that "checking the first list_skb member suffices; i.e if either of the
      list_skb members have non head_frag head, then the first one has too".
      
      It turns out this assumption does not hold. We've seen BUG_ON being hit
      in skb_segment when skbs on the frag_list had differing head_frag with
      the vmxnet3 driver. This happens because __netdev_alloc_skb and
      __napi_alloc_skb can return a skb that is page backed or kmalloced
      depending on the requested size. As the result, the last small skb in
      the GRO packet can be kmalloced.
      
      There are three different locations where this can be fixed:
      
      (1) We could check head_frag in GRO and not allow GROing skbs with
          different head_frag. However, that would lead to performance
          regression on normal forward paths with unmodified gso_size, where
          !head_frag in the last packet is not a problem.
      
      (2) Set a flag in bpf_skb_net_grow and bpf_skb_net_shrink indicating
          that NETIF_F_SG is undesirable. That would need to eat a bit in
          sk_buff. Furthermore, that flag can be unset when all skbs on the
          frag_list are page backed. To retain good performance,
          bpf_skb_net_grow/shrink would have to walk the frag_list.
      
      (3) Walk the frag_list in skb_segment when determining whether
          NETIF_F_SG should be cleared. This of course slows things down.
      
      This patch implements (3). To limit the performance impact in
      skb_segment, the list is walked only for skbs with SKB_GSO_DODGY set
      that have gso_size changed. Normal paths thus will not hit it.
      
      We could check only the last skb but since we need to walk the whole
      list anyway, let's stay on the safe side.
      
      Fixes: 3dcbdb13
      
       ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list")
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/e04426a6a91baf4d1081e1b478c82b5de25fdf21.1667407944.git.jbenc@redhat.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e4b7a99
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · f2c24be5
      Jakub Kicinski authored
      
      
      Daniel Borkmann says:
      
      ====================
      bpf 2022-11-04
      
      We've added 8 non-merge commits during the last 3 day(s) which contain
      a total of 10 files changed, 113 insertions(+), 16 deletions(-).
      
      The main changes are:
      
      1) Fix memory leak upon allocation failure in BPF verifier's stack state
         tracking, from Kees Cook.
      
      2) Fix address leakage when BPF progs release reference to an object,
         from Youlin Li.
      
      3) Fix BPF CI breakage from buggy in.h uapi header dependency,
         from Andrii Nakryiko.
      
      4) Fix bpftool pin sub-command's argument parsing, from Pu Lehui.
      
      5) Fix BPF sockmap lockdep warning by cancelling psock work outside
         of socket lock, from Cong Wang.
      
      6) Follow-up for BPF sockmap to fix sk_forward_alloc accounting,
         from Wang Yufen.
      
      bpf-for-netdev
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add verifier test for release_reference()
        bpf: Fix wrong reg type conversion in release_reference()
        bpf, sock_map: Move cancel_work_sync() out of sock lock
        tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI
        net/ipv4: Fix linux/in.h header dependencies
        bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE
        bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues
        bpf, verifier: Fix memory leak in array reallocation for stack state
      ====================
      
      Link: https://lore.kernel.org/r/20221104000445.30761-1-daniel@iogearbox.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f2c24be5
    • Youlin Li's avatar
      selftests/bpf: Add verifier test for release_reference() · 475244f5
      Youlin Li authored
      
      
      Add a test case to ensure that released pointer registers will not be
      leaked into the map.
      
      Before fix:
      
        ./test_verifier 984
          984/u reference tracking: try to leak released ptr reg FAIL
          Unexpected success to load!
          verification time 67 usec
          stack depth 4
          processed 23 insns (limit 1000000) max_states_per_insn 0 total_states 2
          peak_states 2 mark_read 1
          984/p reference tracking: try to leak released ptr reg OK
          Summary: 1 PASSED, 0 SKIPPED, 1 FAILED
      
      After fix:
      
        ./test_verifier 984
          984/u reference tracking: try to leak released ptr reg OK
          984/p reference tracking: try to leak released ptr reg OK
          Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
      
      Signed-off-by: default avatarYoulin Li <liulin063@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20221103093440.3161-2-liulin063@gmail.com
      475244f5
    • Youlin Li's avatar
      bpf: Fix wrong reg type conversion in release_reference() · f1db2081
      Youlin Li authored
      Some helper functions will allocate memory. To avoid memory leaks, the
      verifier requires the eBPF program to release these memories by calling
      the corresponding helper functions.
      
      When a resource is released, all pointer registers corresponding to the
      resource should be invalidated. The verifier use release_references() to
      do this job, by apply  __mark_reg_unknown() to each relevant register.
      
      It will give these registers the type of SCALAR_VALUE. A register that
      will contain a pointer value at runtime, but of type SCALAR_VALUE, which
      may allow the unprivileged user to get a kernel pointer by storing this
      register into a map.
      
      Using __mark_reg_not_init() while NOT allow_ptr_leaks can mitigate this
      problem.
      
      Fixes: fd978bf7
      
       ("bpf: Add reference tracking to verifier")
      Signed-off-by: default avatarYoulin Li <liulin063@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20221103093440.3161-1-liulin063@gmail.com
      f1db2081
    • Linus Torvalds's avatar
      Merge tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9521c9d6
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth and netfilter.
      
        Current release - regressions:
      
         - net: several zerocopy flags fixes
      
         - netfilter: fix possible memory leak in nf_nat_init()
      
         - openvswitch: add missing .resv_start_op
      
        Previous releases - regressions:
      
         - neigh: fix null-ptr-deref in neigh_table_clear()
      
         - sched: fix use after free in red_enqueue()
      
         - dsa: fall back to default tagger if we can't load the one from DT
      
         - bluetooth: fix use-after-free in l2cap_conn_del()
      
        Previous releases - always broken:
      
         - netfilter: netlink notifier might race to release objects
      
         - nfc: fix potential memory leak of skb
      
         - bluetooth: fix use-after-free caused by l2cap_reassemble_sdu
      
         - bluetooth: use skb_put to set length
      
         - eth: tun: fix bugs for oversize packet when napi frags enabled
      
         - eth: lan966x: fixes for when MTU is changed
      
         - eth: dwmac-loongson: fix invalid mdio_node"
      
      * tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
        vsock: fix possible infinite sleep in vsock_connectible_wait_data()
        vsock: remove the unused 'wait' in vsock_connectible_recvmsg()
        ipv6: fix WARNING in ip6_route_net_exit_late()
        bridge: Fix flushing of dynamic FDB entries
        net, neigh: Fix null-ptr-deref in neigh_table_clear()
        net/smc: Fix possible leaked pernet namespace in smc_init()
        stmmac: dwmac-loongson: fix invalid mdio_node
        ibmvnic: Free rwi on reset success
        net: mdio: fix undefined behavior in bit shift for __mdiobus_register
        Bluetooth: L2CAP: Fix attempting to access uninitialized memory
        Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
        Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
        Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect
        Bluetooth: L2CAP: Fix memory leak in vhci_write
        Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
        Bluetooth: virtio_bt: Use skb_put to set length
        Bluetooth: hci_conn: Fix CIS connection dst_type handling
        Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
        netfilter: ipset: enforce documented limit to prevent allocating huge memory
        isdn: mISDN: netjet: fix wrong check of device registration
        ...
      9521c9d6
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 4d740391
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix an endian thinko in the asm-generic compat_arg_u64() which led to
         syscall arguments being swapped for some compat syscalls.
      
       - Fix syscall wrapper handling of syscalls with 64-bit arguments on
         32-bit kernels, which led to syscall arguments being misplaced.
      
       - A build fix for amdgpu on Book3E with AltiVec disabled.
      
      Thanks to Andreas Schwab, Christian Zigotzky, and Arnd Bergmann.
      
      * tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/32: Select ARCH_SPLIT_ARG64
        powerpc/32: fix syscall wrappers with 64-bit arguments
        asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual()
        powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec
      4d740391
  4. Nov 03, 2022
    • Cong Wang's avatar
      bpf, sock_map: Move cancel_work_sync() out of sock lock · 8bbabb3f
      Cong Wang authored
      Stanislav reported a lockdep warning, which is caused by the
      cancel_work_sync() called inside sock_map_close(), as analyzed
      below by Jakub:
      
      psock->work.func = sk_psock_backlog()
        ACQUIRE psock->work_mutex
          sk_psock_handle_skb()
            skb_send_sock()
              __skb_send_sock()
                sendpage_unlocked()
                  kernel_sendpage()
                    sock->ops->sendpage = inet_sendpage()
                      sk->sk_prot->sendpage = tcp_sendpage()
                        ACQUIRE sk->sk_lock
                          tcp_sendpage_locked()
                        RELEASE sk->sk_lock
        RELEASE psock->work_mutex
      
      sock_map_close()
        ACQUIRE sk->sk_lock
        sk_psock_stop()
          sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED)
          cancel_work_sync()
            __cancel_work_timer()
              __flush_work()
                // wait for psock->work to finish
        RELEASE sk->sk_lock
      
      We can move the cancel_work_sync() out of the sock lock protection,
      but still before saved_close() was called.
      
      Fixes: 799aa7f9
      
       ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
      Reported-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20221102043417.279409-1-xiyou.wangcong@gmail.com
      8bbabb3f
    • Andrii Nakryiko's avatar
      tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI · a778f5d4
      Andrii Nakryiko authored
      With recent sync of linux/in.h tools/include headers are now relying on
      __DECLARE_FLEX_ARRAY macro, which isn't itself defined inside
      tools/include headers anywhere and is instead assumed to be present in
      system-wide UAPI header. This breaks isolated environments that don't
      have kernel UAPI headers installed system-wide, like BPF CI ([0]).
      
      To fix this, bring in include/uapi/linux/stddef.h into tools/include.
      We can't just copy/paste it, though, it has to be processed with
      scripts/headers_install.sh, which has a dependency on scripts/unifdef.
      So the full command to (re-)generate stddef.h for inclusion into
      tools/include directory is:
      
        $ make scripts_unifdef && \
          cp $KBUILD_OUTPUT/scripts/unifdef scripts/ && \
          scripts/headers_install.sh include/uapi/linux/stddef.h tools/include/uapi/linux/stddef.h
      
      This assumes KBUILD_OUTPUT envvar is set and used for out-of-tree builds.
      
        [0] https://github.com/kernel-patches/bpf/actions/runs/3379432493/jobs/5610982609
      
      Fixes: 036b8f5b
      
       ("tools headers uapi: Update linux/in.h copy")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/bpf/20221102182517.2675301-2-andrii@kernel.org
      a778f5d4
    • Andrii Nakryiko's avatar
      net/ipv4: Fix linux/in.h header dependencies · aec1dc97
      Andrii Nakryiko authored
      __DECLARE_FLEX_ARRAY is defined in include/uapi/linux/stddef.h but
      doesn't seem to be explicitly included from include/uapi/linux/in.h,
      which breaks BPF selftests builds (once we sync linux/stddef.h into
      tools/include directory in the next patch). Fix this by explicitly
      including linux/stddef.h.
      
      Given this affects BPF CI and bpf tree, targeting this for bpf tree.
      
      Fixes: 5854a09b
      
       ("net/ipv4: Use __DECLARE_FLEX_ARRAY() helper")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/bpf/20221102182517.2675301-1-andrii@kernel.org
      aec1dc97
    • Paolo Abeni's avatar
      Merge branch 'vsock-remove-an-unused-variable-and-fix-infinite-sleep' · 715aee0f
      Paolo Abeni authored
      
      
      Dexuan Cui says:
      
      ====================
      vsock: remove an unused variable and fix infinite sleep
      
      Patch 1 removes the unused 'wait' variable.
      Patch 2 fixes an infinite sleep issue reported by a hv_sock user.
      ====================
      
      Link: https://lore.kernel.org/r/20221101021706.26152-1-decui@microsoft.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      715aee0f
    • Dexuan Cui's avatar
      vsock: fix possible infinite sleep in vsock_connectible_wait_data() · 466a8533
      Dexuan Cui authored
      Currently vsock_connectible_has_data() may miss a wakeup operation
      between vsock_connectible_has_data() == 0 and the prepare_to_wait().
      
      Fix the race by adding the process to the wait queue before checking
      vsock_connectible_has_data().
      
      Fixes: b3f7fd54
      
       ("af_vsock: separate wait data loop")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reported-by: default avatarFrédéric Dalleau <frederic.dalleau@docker.com>
      Tested-by: default avatarFrédéric Dalleau <frederic.dalleau@docker.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      466a8533
    • Dexuan Cui's avatar
      vsock: remove the unused 'wait' in vsock_connectible_recvmsg() · cf6ff0df
      Dexuan Cui authored
      Remove the unused variable introduced by 19c1b90e.
      
      Fixes: 19c1b90e
      
       ("af_vsock: separate receive data loop")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cf6ff0df
    • Zhengchao Shao's avatar
      ipv6: fix WARNING in ip6_route_net_exit_late() · 768b3c74
      Zhengchao Shao authored
      During the initialization of ip6_route_net_init_late(), if file
      ipv6_route or rt6_stats fails to be created, the initialization is
      successful by default. Therefore, the ipv6_route or rt6_stats file
      doesn't be found during the remove in ip6_route_net_exit_late(). It
      will cause WRNING.
      
      The following is the stack information:
      name 'rt6_stats'
      WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460
      Modules linked in:
      Workqueue: netns cleanup_net
      RIP: 0010:remove_proc_entry+0x389/0x460
      PKRU: 55555554
      Call Trace:
      <TASK>
      ops_exit_list+0xb0/0x170
      cleanup_net+0x4ea/0xb00
      process_one_work+0x9bf/0x1710
      worker_thread+0x665/0x1080
      kthread+0x2e4/0x3a0
      ret_from_fork+0x1f/0x30
      </TASK>
      
      Fixes: cdb18761
      
       ("[NETNS][IPV6] route6 - create route6 proc files for the namespace")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      768b3c74
    • Ido Schimmel's avatar
      bridge: Fix flushing of dynamic FDB entries · 628ac04a
      Ido Schimmel authored
      The following commands should result in all the dynamic FDB entries
      being flushed, but instead all the non-local (non-permanent) entries are
      flushed:
      
       # bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static
       # bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic
       # ip link set dev br0 type bridge fdb_flush
       # bridge fdb show brport dummy1
       00:00:00:00:00:01 master br0 permanent
       33:33:00:00:00:01 self permanent
       01:00:5e:00:00:01 self permanent
      
      This is because br_fdb_flush() works with FDB flags and not the
      corresponding enumerator values. Fix by passing the FDB flag instead.
      
      After the fix:
      
       # bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static
       # bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic
       # ip link set dev br0 type bridge fdb_flush
       # bridge fdb show brport dummy1
       00:aa:bb:cc:dd:ee master br0 static
       00:00:00:00:00:01 master br0 permanent
       33:33:00:00:00:01 self permanent
       01:00:5e:00:00:01 self permanent
      
      Fixes: 1f78ee14
      
       ("net: bridge: fdb: add support for fine-grained flushing")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://lore.kernel.org/r/20221101185753.2120691-1-idosch@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      628ac04a