Skip to content
  1. Aug 09, 2023
  2. Aug 08, 2023
    • Jonas Gorski's avatar
      net: marvell: prestera: fix handling IPv4 routes with nhid · 2aa71b4b
      Jonas Gorski authored
      Fix handling IPv4 routes referencing a nexthop via its id by replacing
      calls to fib_info_nh() with fib_info_nhc().
      
      Trying to add an IPv4 route referencing a nextop via nhid:
      
          $ ip link set up swp5
          $ ip a a 10.0.0.1/24 dev swp5
          $ ip nexthop add dev swp5 id 20 via 10.0.0.2
          $ ip route add 10.0.1.0/24 nhid 20
      
      triggers warnings when trying to handle the route:
      
      [  528.805763] ------------[ cut here ]------------
      [  528.810437] WARNING: CPU: 3 PID: 53 at include/net/nexthop.h:468 __prestera_fi_is_direct+0x2c/0x68 [prestera]
      [  528.820434] Modules linked in: prestera_pci act_gact act_police sch_ingress cls_u32 cls_flower prestera arm64_delta_tn48m_dn_led(O) arm64_delta_tn48m_dn_cpld(O) [last unloaded: prestera_pci]
      [  528.837485] CPU: 3 PID: 53 Comm: kworker/u8:3 Tainted: G           O       6.4.5 #1
      [  528.845178] Hardware name: delta,tn48m-dn (DT)
      [  528.849641] Workqueue: prestera_ordered __prestera_router_fib_event_work [prestera]
      [  528.857352] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [  528.864347] pc : __prestera_fi_is_direct+0x2c/0x68 [prestera]
      [  528.870135] lr : prestera_k_arb_fib_evt+0xb20/0xd50 [prestera]
      [  528.876007] sp : ffff80000b20bc90
      [  528.879336] x29: ffff80000b20bc90 x28: 0000000000000000 x27: ffff0001374d3a48
      [  528.886510] x26: ffff000105604000 x25: ffff000134af8a28 x24: ffff0001374d3800
      [  528.893683] x23: ffff000101c89148 x22: ffff000101c89000 x21: ffff000101c89200
      [  528.900855] x20: ffff00013641fda0 x19: ffff800009d01088 x18: 0000000000000059
      [  528.908027] x17: 0000000000000277 x16: 0000000000000000 x15: 0000000000000000
      [  528.915198] x14: 0000000000000003 x13: 00000000000fe400 x12: 0000000000000000
      [  528.922371] x11: 0000000000000002 x10: 0000000000000aa0 x9 : ffff8000013d2020
      [  528.929543] x8 : 0000000000000018 x7 : 000000007b1703f8 x6 : 000000001ca72f86
      [  528.936715] x5 : 0000000033399ea7 x4 : 0000000000000000 x3 : ffff0001374d3acc
      [  528.943886] x2 : 0000000000000000 x1 : ffff00010200de00 x0 : ffff000134ae3f80
      [  528.951058] Call trace:
      [  528.953516]  __prestera_fi_is_direct+0x2c/0x68 [prestera]
      [  528.958952]  __prestera_router_fib_event_work+0x100/0x158 [prestera]
      [  528.965348]  process_one_work+0x208/0x488
      [  528.969387]  worker_thread+0x4c/0x430
      [  528.973068]  kthread+0x120/0x138
      [  528.976313]  ret_from_fork+0x10/0x20
      [  528.979909] ---[ end trace 0000000000000000 ]---
      [  528.984998] ------------[ cut here ]------------
      [  528.989645] WARNING: CPU: 3 PID: 53 at include/net/nexthop.h:468 __prestera_fi_is_direct+0x2c/0x68 [prestera]
      [  528.999628] Modules linked in: prestera_pci act_gact act_police sch_ingress cls_u32 cls_flower prestera arm64_delta_tn48m_dn_led(O) arm64_delta_tn48m_dn_cpld(O) [last unloaded: prestera_pci]
      [  529.016676] CPU: 3 PID: 53 Comm: kworker/u8:3 Tainted: G        W  O       6.4.5 #1
      [  529.024368] Hardware name: delta,tn48m-dn (DT)
      [  529.028830] Workqueue: prestera_ordered __prestera_router_fib_event_work [prestera]
      [  529.036539] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [  529.043533] pc : __prestera_fi_is_direct+0x2c/0x68 [prestera]
      [  529.049318] lr : __prestera_k_arb_fc_apply+0x280/0x2f8 [prestera]
      [  529.055452] sp : ffff80000b20bc60
      [  529.058781] x29: ffff80000b20bc60 x28: 0000000000000000 x27: ffff0001374d3a48
      [  529.065953] x26: ffff000105604000 x25: ffff000134af8a28 x24: ffff0001374d3800
      [  529.073126] x23: ffff000101c89148 x22: ffff000101c89148 x21: ffff00013641fda0
      [  529.080299] x20: ffff000101c89000 x19: ffff000101c89020 x18: 0000000000000059
      [  529.087471] x17: 0000000000000277 x16: 0000000000000000 x15: 0000000000000000
      [  529.094642] x14: 0000000000000003 x13: 00000000000fe400 x12: 0000000000000000
      [  529.101814] x11: 0000000000000002 x10: 0000000000000aa0 x9 : ffff8000013cee80
      [  529.108985] x8 : 0000000000000018 x7 : 000000007b1703f8 x6 : 0000000000000018
      [  529.116157] x5 : 00000000d3497eb6 x4 : ffff000105604081 x3 : 000000008e979557
      [  529.123329] x2 : 0000000000000000 x1 : ffff00010200de00 x0 : ffff000134ae3f80
      [  529.130501] Call trace:
      [  529.132958]  __prestera_fi_is_direct+0x2c/0x68 [prestera]
      [  529.138394]  prestera_k_arb_fib_evt+0x6b8/0xd50 [prestera]
      [  529.143918]  __prestera_router_fib_event_work+0x100/0x158 [prestera]
      [  529.150313]  process_one_work+0x208/0x488
      [  529.154348]  worker_thread+0x4c/0x430
      [  529.158030]  kthread+0x120/0x138
      [  529.161274]  ret_from_fork+0x10/0x20
      [  529.164867] ---[ end trace 0000000000000000 ]---
      
      and results in a non offloaded route:
      
          $ ip route
          10.0.0.0/24 dev swp5 proto kernel scope link src 10.0.0.1 rt_trap
          10.0.1.0/24 nhid 20 via 10.0.0.2 dev swp5 rt_trap
      
      When creating a route referencing a nexthop via its ID, the nexthop will
      be stored in a separate nh pointer instead of the array of nexthops in
      the fib_info struct. This causes issues since fib_info_nh() only handles
      the nexthops array, but not the separate nh pointer, and will loudly
      WARN about it.
      
      In contrast fib_info_nhc() handles both, but returns a fib_nh_common
      pointer instead of a fib_nh pointer. Luckily we only ever access fields
      from the fib_nh_common parts, so we can just replace all instances of
      fib_info_nh() with fib_info_nhc() and access the fields via their
      fib_nh_common names.
      
      This allows handling IPv4 routes with an external nexthop, and they now
      get offloaded as expected:
      
          $ ip route
          10.0.0.0/24 dev swp5 proto kernel scope link src 10.0.0.1 rt_trap
          10.0.1.0/24 nhid 20 via 10.0.0.2 dev swp5 offload rt_offload
      
      Fixes: 396b80cb
      
       ("net: marvell: prestera: Add neighbour cache accounting")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@bisdn.de>
      Acked-by: default avatarElad Nachman <enachman@marvell.com>
      Link: https://lore.kernel.org/r/20230804101220.247515-1-jonas.gorski@bisdn.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2aa71b4b
    • Andrew Kanner's avatar
      net: core: remove unnecessary frame_sz check in bpf_xdp_adjust_tail() · d14eea09
      Andrew Kanner authored
      Syzkaller reported the following issue:
      =======================================
      Too BIG xdp->frame_sz = 131072
      WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
        ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
      WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
        bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
      ...
      Call Trace:
       <TASK>
       bpf_prog_4add87e5301a4105+0x1a/0x1c
       __bpf_prog_run include/linux/filter.h:600 [inline]
       bpf_prog_run_xdp include/linux/filter.h:775 [inline]
       bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
       netif_receive_generic_xdp net/core/dev.c:4807 [inline]
       do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
       tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
       tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
       call_write_iter include/linux/fs.h:1871 [inline]
       new_sync_write fs/read_write.c:491 [inline]
       vfs_write+0x650/0xe40 fs/read_write.c:584
       ksys_write+0x12f/0x250 fs/read_write.c:637
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2b
      
      
      ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But Jesper
      Dangaard Brouer <jbrouer@redhat.com> noted that after introducing the
      xdp_init_buff() which all XDP driver use - it's safe to remove this
      check. The original intend was to catch cases where XDP drivers have
      not been updated to use xdp.frame_sz, but that is not longer a concern
      (since xdp_init_buff).
      
      Running the initial syzkaller repro it was discovered that the
      contiguous physical memory allocation is used for both xdp paths in
      tun_get_user(), e.g. tun_build_skb() and tun_alloc_skb(). It was also
      stated by Jesper Dangaard Brouer <jbrouer@redhat.com> that XDP can
      work on higher order pages, as long as this is contiguous physical
      memory (e.g. a page).
      
      Reported-and-tested-by: default avatar <syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/
      Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a
      Link: https://lore.kernel.org/all/20230725155403.796-1-andrew.kanner@gmail.com/T/
      Fixes: 43b5169d
      
       ("net, xdp: Introduce xdp_init_buff utility routine")
      Signed-off-by: default avatarAndrew Kanner <andrew.kanner@gmail.com>
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20230803190316.2380231-1-andrew.kanner@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d14eea09
    • Andrew Kanner's avatar
      drivers: net: prevent tun_build_skb() to exceed the packet size limit · 59eeb232
      Andrew Kanner authored
      Using the syzkaller repro with reduced packet size it was discovered
      that XDP_PACKET_HEADROOM is not checked in tun_can_build_skb(),
      although pad may be incremented in tun_build_skb(). This may end up
      with exceeding the PAGE_SIZE limit in tun_build_skb().
      
      Jason Wang <jasowang@redhat.com> proposed to count XDP_PACKET_HEADROOM
      always (e.g. without rcu_access_pointer(tun->xdp_prog)) in
      tun_can_build_skb() since there's a window during which XDP program
      might be attached between tun_can_build_skb() and tun_build_skb().
      
      Fixes: 7df13219 ("tun: reserve extra headroom only when XDP is set")
      Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a
      
      
      Signed-off-by: default avatarAndrew Kanner <andrew.kanner@gmail.com>
      Link: https://lore.kernel.org/r/20230803185947.2379988-1-andrew.kanner@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      59eeb232
    • Jakub Kicinski's avatar
      Merge branch 'wireguard-fixes-for-6-5-rc6' · fa41884c
      Jakub Kicinski authored
      Jason A. Donenfeld says:
      
      ====================
      wireguard fixes for 6.5-rc6
      
      Just one patch this time, somewhat late in the cycle:
      
      1) Fix an off-by-one calculation for the maximum node depth size in the
         allowedips trie data structure, and also adjust the self-tests to hit
         this case so it doesn't regress again in the future.
      ====================
      
      Link: https://lore.kernel.org/r/20230807132146.2191597-1-Jason@zx2c4.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa41884c
    • Jason A. Donenfeld's avatar
      wireguard: allowedips: expand maximum node depth · 46622219
      Jason A. Donenfeld authored
      In the allowedips self-test, nodes are inserted into the tree, but it
      generated an even amount of nodes, but for checking maximum node depth,
      there is of course the root node, which makes the total number
      necessarily odd. With two few nodes added, it never triggered the
      maximum depth check like it should have. So, add 129 nodes instead of
      128 nodes, and do so with a more straightforward scheme, starting with
      all the bits set, and shifting over one each time. Then increase the
      maximum depth to 129, and choose a better name for that variable to
      make it clear that it represents depth as opposed to bits.
      
      Cc: stable@vger.kernel.org
      Fixes: e7096c13
      
       ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Link: https://lore.kernel.org/r/20230807132146.2191597-2-Jason@zx2c4.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      46622219
    • Ziyang Xuan's avatar
      bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves · 01f4fd27
      Ziyang Xuan authored
      BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
      following testcase:
      
        # ip netns add ns1
        # ip netns exec ns1 ip link add bond0 type bond mode 0
        # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
        # ip netns exec ns1 ip link set bond_slave_1 master bond0
        # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
        # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
        # ip netns exec ns1 ip link set bond_slave_1 nomaster
        # ip netns del ns1
      
      The logical analysis of the problem is as follows:
      
      1. create ETH_P_8021AD protocol vlan10 for bond_slave_1:
      register_vlan_dev()
        vlan_vid_add()
          vlan_info_alloc()
          __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1
      
      2. create ETH_P_8021AD protocol bond0_vlan10 for bond0:
      register_vlan_dev()
        vlan_vid_add()
          __vlan_vid_add()
            vlan_add_rx_filter_info()
                if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER
                    return 0;
      
                if (netif_device_present(dev))
                    return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called
                    // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.
      
      3. detach bond_slave_1 from bond0:
      __bond_release_one()
        vlan_vids_del_by_dev()
          list_for_each_entry(vid_info, &vlan_info->vid_list, list)
              vlan_vid_del(dev, vid_info->proto, vid_info->vid);
              // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted.
              // bond_slave_1->vlan_info will be assigned NULL.
      
      4. delete vlan10 during delete ns1:
      default_device_exit_batch()
        dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10
          vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1
      	BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!
      
      Add S-VLAN tag related features support to bond driver. So the bond driver
      will always propagate the VLAN info to its slaves.
      
      Fixes: 8ad227ff
      
       ("net: vlan: add 802.1ad support")
      Suggested-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      01f4fd27
    • Lama Kayal's avatar
      net/mlx5e: Add capability check for vnic counters · 548ee049
      Lama Kayal authored
      Add missing capability check for each of the vnic counters exposed by
      devlink health reporter, and thus avoid unexpected behavior due to
      invalid access to registers.
      
      While at it, read only the exact number of bits for each counter whether
      it was 32 bits or 64 bits.
      
      Fixes: b0bc615d ("net/mlx5: Add vnic devlink health reporter to PFs/VFs")
      Fixes: a33682e4
      
       ("net/mlx5e: Expose catastrophic steering error counters")
      Signed-off-by: default avatarLama Kayal <lkayal@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      548ee049
    • Moshe Shemesh's avatar
      net/mlx5: Reload auxiliary devices in pci error handlers · aab8e1a2
      Moshe Shemesh authored
      Handling pci errors should fully teardown and load back auxiliary
      devices, same as done through mlx5 health recovery flow.
      
      Fixes: 72ed5d56
      
       ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      aab8e1a2
    • Moshe Shemesh's avatar
      net/mlx5: Skip clock update work when device is in error state · d0062076
      Moshe Shemesh authored
      When device is in error state, marked by the flag
      MLX5_DEVICE_STATE_INTERNAL_ERROR, the HW and PCI may not be accessible
      and so clock update work should be skipped. Furthermore, such access
      through PCI in error state, after calling mlx5_pci_disable_device() can
      result in failing to recover from pci errors.
      
      Fixes: ef9814de
      
       ("net/mlx5e: Add HW timestamping (TS) support")
      Reported-and-tested-by: default avatarGanesh G R <ganeshgr@linux.ibm.com>
      Closes: https://lore.kernel.org/netdev/9bdb9b9d-140a-7a28-f0de-2e64e873c068@nvidia.com
      
      
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d0062076
    • Shay Drory's avatar
      net/mlx5: LAG, Check correct bucket when modifying LAG · 86ed7b77
      Shay Drory authored
      Cited patch introduced buckets in hash mode, but missed to update
      the ports/bucket check when modifying LAG.
      Fix the check.
      
      Fixes: 352899f3
      
       ("net/mlx5: Lag, use buckets in hash mode")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      86ed7b77
    • Chris Mi's avatar
      net/mlx5e: Unoffload post act rule when handling FIB events · 6b5926eb
      Chris Mi authored
      If having the following tc rule on stack device:
      
      filter parent ffff: protocol ip pref 3 flower chain 1
      filter parent ffff: protocol ip pref 3 flower chain 1 handle 0x1
        dst_mac 24:25:d0:e1:00:00
        src_mac 02:25:d0:25:01:02
        eth_type ipv4
        ct_state +trk+new
        in_hw in_hw_count 1
              action order 1: ct commit zone 0 pipe
               index 2 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec
              Action statistics:
              Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats delayed
      
              action order 2: tunnel_key  set
              src_ip 192.168.1.25
              dst_ip 192.168.1.26
              key_id 4
              dst_port 4789
              csum pipe
               index 3 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec
              Action statistics:
              Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats delayed
      
              action order 3: mirred (Egress Redirect to device vxlan1) stolen
              index 9 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec
              Action statistics:
              Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats delayed
      
      When handling FIB events, the rule in post act will not be deleted.
      And because the post act rule has packet reformat and modify header
      actions, also will hit the following syndromes:
      
      mlx5_core 0000:08:00.0: mlx5_cmd_out_err:829:(pid 11613): DEALLOC_MODIFY_HEADER_CONTEXT(0x941) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x1ab444), err(-22)
      mlx5_core 0000:08:00.0: mlx5_cmd_out_err:829:(pid 11613): DEALLOC_PACKET_REFORMAT_CONTEXT(0x93e) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x179e84), err(-22)
      
      Fix it by unoffloading post act rule when handling FIB events.
      
      Fixes: 314e1105
      
       ("net/mlx5e: Add post act offload/unoffload API")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6b5926eb
    • Daniel Jurgens's avatar
      net/mlx5: Fix devlink controller number for ECVF · 2d691c90
      Daniel Jurgens authored
      The controller number for ECVFs is always 0, because the ECPF must be
      the eswitch owner for EC VFs to be enabled.
      
      Fixes: dc131808
      
       ("net/mlx5: Enable devlink port for embedded cpu VF vports")
      Signed-off-by: default avatarDaniel Jurgens <danielj@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2d691c90
    • Daniel Jurgens's avatar
      net/mlx5: Allow 0 for total host VFs · 2dc2b392
      Daniel Jurgens authored
      When querying eswitch functions 0 is a valid number of host VFs. After
      introducing ARM SRIOV falling through to getting the max value from PCI
      results in using the total VFs allowed on the ARM for the host.
      
      Fixes: 86eec50b
      
       ("net/mlx5: Support querying max VFs from device");
      Signed-off-by: default avatarDaniel Jurgens <danielj@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2dc2b392
    • Daniel Jurgens's avatar
      net/mlx5: Return correct EC_VF function ID · 06c868fd
      Daniel Jurgens authored
      The ECVF function ID range is 1..max_ec_vfs. Currently
      mlx5_vport_to_func_id returns 0..max_ec_vfs - 1. Which
      results in a syndrome when querying the caps with more
      recent firmware, or reading incorrect caps with older
      firmware that supports EC VFs.
      
      Fixes: 9ac0b128
      
       ("net/mlx5: Update vport caps query/set for EC VFs")
      Signed-off-by: default avatarDaniel Jurgens <danielj@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      06c868fd
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Fix wrong allocation of modify hdr pattern · 8bfe1e19
      Yevgeny Kliteynik authored
      Fixing wrong calculation of the modify hdr pattern size,
      where the previously calculated number would not be enough
      to accommodate the required number of actions.
      
      Fixes: da5d0027
      
       ("net/mlx5: DR, Add cache for modify header pattern")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarErez Shitrit <erezsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8bfe1e19
    • Jianbo Liu's avatar
      net/mlx5e: TC, Fix internal port memory leak · ac5da544
      Jianbo Liu authored
      The flow rule can be splited, and the extra post_act rules are added
      to post_act table. It's possible to trigger memleak when the rule
      forwards packets from internal port and over tunnel, in the case that,
      for example, CT 'new' state offload is allowed. As int_port object is
      assigned to the flow attribute of post_act rule, and its refcnt is
      incremented by mlx5e_tc_int_port_get(), but mlx5e_tc_int_port_put() is
      not called, the refcnt is never decremented, then int_port is never
      freed.
      
      The kmemleak reports the following error:
      unreferenced object 0xffff888128204b80 (size 64):
        comm "handler20", pid 50121, jiffies 4296973009 (age 642.932s)
        hex dump (first 32 bytes):
          01 00 00 00 19 00 00 00 03 f0 00 00 04 00 00 00  ................
          98 77 67 41 81 88 ff ff 98 77 67 41 81 88 ff ff  .wgA.....wgA....
        backtrace:
          [<00000000e992680d>] kmalloc_trace+0x27/0x120
          [<000000009e945a98>] mlx5e_tc_int_port_get+0x3f3/0xe20 [mlx5_core]
          [<0000000035a537f0>] mlx5e_tc_add_fdb_flow+0x473/0xcf0 [mlx5_core]
          [<0000000070c2cec6>] __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
          [<000000005cc84048>] mlx5e_configure_flower+0xd40/0x4c40 [mlx5_core]
          [<000000004f8a2031>] mlx5e_rep_indr_offload.isra.0+0x10e/0x1c0 [mlx5_core]
          [<000000007df797dc>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core]
          [<0000000016c15cc3>] tc_setup_cb_add+0x1cf/0x410
          [<00000000a63305b4>] fl_hw_replace_filter+0x38f/0x670 [cls_flower]
          [<000000008bc9e77c>] fl_change+0x1fd5/0x4430 [cls_flower]
          [<00000000e7f766e4>] tc_new_tfilter+0x867/0x2010
          [<00000000e101c0ef>] rtnetlink_rcv_msg+0x6fc/0x9f0
          [<00000000e1111d44>] netlink_rcv_skb+0x12c/0x360
          [<0000000082dd6c8b>] netlink_unicast+0x438/0x710
          [<00000000fc568f70>] netlink_sendmsg+0x794/0xc50
          [<0000000016e92590>] sock_sendmsg+0xc5/0x190
      
      So fix this by moving int_port cleanup code to the flow attribute
      free helper, which is used by all the attribute free cases.
      
      Fixes: 8300f225
      
       ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ac5da544
    • Gal Pressman's avatar
      net/mlx5e: Take RTNL lock when needed before calling xdp_set_features() · 72cc6549
      Gal Pressman authored
      Hold RTNL lock when calling xdp_set_features() with a registered netdev,
      as the call triggers the netdev notifiers. This could happen when
      switching from uplink rep to nic profile for example.
      
      This resolves the following call trace:
      
      RTNL: assertion failed at net/core/dev.c (1953)
      WARNING: CPU: 6 PID: 112670 at net/core/dev.c:1953 call_netdevice_notifiers_info+0x7c/0x80
      Modules linked in: sch_mqprio sch_mqprio_lib act_tunnel_key act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress bonding ib_umad ip_gre rdma_ucm mlx5_vfio_pci ipip tunnel4 ip6_gre gre mlx5_ib vfio_pci vfio_pci_core vfio_iommu_type1 ib_uverbs vfio mlx5_core ib_ipoib geneve nf_tables ip6_tunnel tunnel6 iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs]
      CPU: 6 PID: 112670 Comm: devlink Not tainted 6.4.0-rc7_for_upstream_min_debug_2023_06_28_17_02 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      RIP: 0010:call_netdevice_notifiers_info+0x7c/0x80
      Code: 90 ff 80 3d 2d 6b f7 00 00 75 c5 ba a1 07 00 00 48 c7 c6 e4 ce 0b 82 48 c7 c7 c8 f4 04 82 c6 05 11 6b f7 00 01 e8 a4 7c 8e ff <0f> 0b eb a2 0f 1f 44 00 00 55 48 89 e5 41 54 48 83 e4 f0 48 83 ec
      RSP: 0018:ffff8882a21c3948 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffffffff82e6f880 RCX: 0000000000000027
      RDX: ffff88885f99b5c8 RSI: 0000000000000001 RDI: ffff88885f99b5c0
      RBP: 0000000000000028 R08: ffff88887ffabaa8 R09: 0000000000000003
      R10: ffff88887fecbac0 R11: ffff88887ff7bac0 R12: ffff8882a21c3968
      R13: ffff88811c018940 R14: 0000000000000000 R15: ffff8881274401a0
      FS:  00007fe141c81800(0000) GS:ffff88885f980000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f787c28b948 CR3: 000000014bcf3005 CR4: 0000000000370ea0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ? __warn+0x79/0x120
       ? call_netdevice_notifiers_info+0x7c/0x80
       ? report_bug+0x17c/0x190
       ? handle_bug+0x3c/0x60
       ? exc_invalid_op+0x14/0x70
       ? asm_exc_invalid_op+0x16/0x20
       ? call_netdevice_notifiers_info+0x7c/0x80
       ? call_netdevice_notifiers_info+0x7c/0x80
       call_netdevice_notifiers+0x2e/0x50
       mlx5e_set_xdp_feature+0x21/0x50 [mlx5_core]
       mlx5e_nic_init+0xf1/0x1a0 [mlx5_core]
       mlx5e_netdev_init_profile+0x76/0x110 [mlx5_core]
       mlx5e_netdev_attach_profile+0x1f/0x90 [mlx5_core]
       mlx5e_netdev_change_profile+0x92/0x160 [mlx5_core]
       mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core]
       mlx5e_vport_rep_unload+0xaa/0xc0 [mlx5_core]
       __esw_offloads_unload_rep+0x52/0x60 [mlx5_core]
       mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core]
       esw_offloads_unload_rep+0x34/0x70 [mlx5_core]
       esw_offloads_disable+0x2b/0x90 [mlx5_core]
       mlx5_eswitch_disable_locked+0x1b9/0x210 [mlx5_core]
       mlx5_devlink_eswitch_mode_set+0xf5/0x630 [mlx5_core]
       ? devlink_get_from_attrs_lock+0x9e/0x110
       devlink_nl_cmd_eswitch_set_doit+0x60/0xe0
       genl_family_rcv_msg_doit.isra.0+0xc2/0x110
       genl_rcv_msg+0x17d/0x2b0
       ? devlink_get_from_attrs_lock+0x110/0x110
       ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290
       ? devlink_pernet_pre_exit+0xf0/0xf0
       ? genl_family_rcv_msg_doit.isra.0+0x110/0x110
       netlink_rcv_skb+0x54/0x100
       genl_rcv+0x24/0x40
       netlink_unicast+0x1f6/0x2c0
       netlink_sendmsg+0x232/0x4a0
       sock_sendmsg+0x38/0x60
       ? _copy_from_user+0x2a/0x60
       __sys_sendto+0x110/0x160
       ? __count_memcg_events+0x48/0x90
       ? handle_mm_fault+0x161/0x260
       ? do_user_addr_fault+0x278/0x6e0
       __x64_sys_sendto+0x20/0x30
       do_syscall_64+0x3d/0x90
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fe141b1340a
      Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      RSP: 002b:00007fff61d03de8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000afab00 RCX: 00007fe141b1340a
      RDX: 0000000000000038 RSI: 0000000000afab00 RDI: 0000000000000003
      RBP: 0000000000afa910 R08: 00007fe141d80200 R09: 000000000000000c
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
       </TASK>
      
      Fixes: 4d5ab0ad
      
       ("net/mlx5e: take into account device reconfiguration for xdp_features flag")
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      72cc6549
  3. Aug 06, 2023
    • Nitya Sunkad's avatar
      ionic: Add missing err handling for queue reconfig · 52417a95
      Nitya Sunkad authored
      ionic_start_queues_reconfig returns an error code if txrx_init fails.
      Handle this error code in the relevant places.
      
      This fixes a corner case where the device could get left in a detached
      state if the CMB reconfig fails and the attempt to clean up the mess
      also fails. Note that calling netif_device_attach when the netdev is
      already attached does not lead to unexpected behavior.
      
      Change goto name "errout" to "err_out" to maintain consistency across
      goto statements.
      
      Fixes: 40bc471d ("ionic: add tx/rx-push support with device Component Memory Buffers")
      Fixes: 6f7d6f0f
      
       ("ionic: pull reset_queues into tx_timeout handler")
      Signed-off-by: default avatarNitya Sunkad <nitya.sunkad@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52417a95
    • Fedor Pchelkin's avatar
      drivers: vxlan: vnifilter: free percpu vni stats on error path · b1c936e9
      Fedor Pchelkin authored
      In case rhashtable_lookup_insert_fast() fails inside vxlan_vni_add(), the
      allocated percpu vni stats are not freed on the error path.
      
      Introduce vxlan_vni_free() which would work as a nice wrapper to free
      vxlan_vni_node resources properly.
      
      Found by Linux Verification Center (linuxtesting.org).
      
      Fixes: 4095e0e1
      
       ("drivers: vxlan: vnifilter: per vni stats")
      Suggested-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarFedor Pchelkin <pchelkin@ispras.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1c936e9
    • Eric Dumazet's avatar
      macsec: use DEV_STATS_INC() · 32d0a49d
      Eric Dumazet authored
      syzbot/KCSAN reported data-races in macsec whenever dev->stats fields
      are updated.
      
      It appears all of these updates can happen from multiple cpus.
      
      Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
      
      Fixes: c09440f7
      
       ("macsec: introduce IEEE 802.1AE driver")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32d0a49d
    • Jakub Kicinski's avatar
      net: tls: avoid discarding data on record close · 6b47808f
      Jakub Kicinski authored
      TLS records end with a 16B tag. For TLS device offload we only
      need to make space for this tag in the stream, the device will
      generate and replace it with the actual calculated tag.
      
      Long time ago the code would just re-reference the head frag
      which mostly worked but was suboptimal because it prevented TCP
      from combining the record into a single skb frag. I'm not sure
      if it was correct as the first frag may be shorter than the tag.
      
      The commit under fixes tried to replace that with using the page
      frag and if the allocation failed rolling back the data, if record
      was long enough. It achieves better fragment coalescing but is
      also buggy.
      
      We don't roll back the iterator, so unless we're at the end of
      send we'll skip the data we designated as tag and start the
      next record as if the rollback never happened.
      There's also the possibility that the record was constructed
      with MSG_MORE and the data came from a different syscall and
      we already told the user space that we "got it".
      
      Allocate a single dummy page and use it as fallback.
      
      Found by code inspection, and proven by forcing allocation
      failures.
      
      Fixes: e7b159a4
      
       ("net/tls: remove the record tail optimization")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b47808f
  4. Aug 05, 2023