Skip to content
  1. Jan 12, 2023
    • Jeff Layton's avatar
      filelock: new helper: vfs_inode_has_locks · 54e72ce5
      Jeff Layton authored
      [ Upstream commit ab1ddef9
      
       ]
      
      Ceph has a need to know whether a particular inode has any locks set on
      it. It's currently tracking that by a num_locks field in its
      filp->private_data, but that's problematic as it tries to decrement this
      field when releasing locks and that can race with the file being torn
      down.
      
      Add a new vfs_inode_has_locks helper that just returns whether any locks
      are currently held on the inode.
      
      Reviewed-by: default avatarXiubo Li <xiubli@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Stable-dep-of: 461ab10e
      
       ("ceph: switch to vfs_inode_has_locks() to fix file lock bug")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      54e72ce5
    • Carlo Caione's avatar
      drm/meson: Reduce the FIFO lines held when AFBC is not used · f34b03ce
      Carlo Caione authored
      [ Upstream commit 3b754ed6 ]
      
      Having a bigger number of FIFO lines held after vsync is only useful to
      SoCs using AFBC to give time to the AFBC decoder to be reset, configured
      and enabled again.
      
      For SoCs not using AFBC this, on the contrary, is causing on some
      displays issues and a few pixels vertical offset in the displayed image.
      
      Conditionally increase the number of lines held after vsync only for
      SoCs using AFBC, leaving the default value for all the others.
      
      Fixes: 24e0d405
      
       ("drm/meson: hold 32 lines after vsync to give time for AFBC start")
      Signed-off-by: default avatarCarlo Caione <ccaione@baylibre.com>
      Acked-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Acked-by: default avatarNeil Armstrong <neil.armstrong@linaro.org>
      [narmstrong: added fixes tag]
      Signed-off-by: default avatarNeil Armstrong <neil.armstrong@linaro.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221216-afbc_s905x-v1-0-033bebf780d9@baylibre.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f34b03ce
    • Maor Gottlieb's avatar
      RDMA/mlx5: Fix validation of max_rd_atomic caps for DC · 05a8410b
      Maor Gottlieb authored
      [ Upstream commit 8de8482f ]
      
      Currently, when modifying DC, we validate max_rd_atomic user attribute
      against the RC cap, validate against DC. RC and DC QP types have different
      device limitations.
      
      This can cause userspace created DC QPs to malfunction.
      
      Fixes: c32a4f29
      
       ("IB/mlx5: Add support for DC Initiator QP")
      Link: https://lore.kernel.org/r/0c5aee72cea188c3bb770f4207cce7abc9b6fc74.1672231736.git.leonro@nvidia.com
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      05a8410b
    • Shay Drory's avatar
      RDMA/mlx5: Fix mlx5_ib_get_hw_stats when used for device · 8d89870d
      Shay Drory authored
      [ Upstream commit 38b50aa4 ]
      
      Currently, when mlx5_ib_get_hw_stats() is used for device (port_num = 0),
      there is a special handling in order to use the correct counters, but,
      port_num is being passed down the stack without any change.  Also, some
      functions assume that port_num >=1. As a result, the following oops can
      occur.
      
       BUG: unable to handle page fault for address: ffff89510294f1a8
       #PF: supervisor write access in kernel mode
       #PF: error_code(0x0002) - not-present page
       PGD 0 P4D 0
       Oops: 0002 [#1] SMP
       CPU: 8 PID: 1382 Comm: devlink Tainted: G W          6.1.0-rc4_for_upstream_base_2022_11_10_16_12 #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:_raw_spin_lock+0xc/0x20
       Call Trace:
        <TASK>
        mlx5_ib_get_native_port_mdev+0x73/0xe0 [mlx5_ib]
        do_get_hw_stats.constprop.0+0x109/0x160 [mlx5_ib]
        mlx5_ib_get_hw_stats+0xad/0x180 [mlx5_ib]
        ib_setup_device_attrs+0xf0/0x290 [ib_core]
        ib_register_device+0x3bb/0x510 [ib_core]
        ? atomic_notifier_chain_register+0x67/0x80
        __mlx5_ib_add+0x2b/0x80 [mlx5_ib]
        mlx5r_probe+0xb8/0x150 [mlx5_ib]
        ? auxiliary_match_id+0x6a/0x90
        auxiliary_bus_probe+0x3c/0x70
        ? driver_sysfs_add+0x6b/0x90
        really_probe+0xcd/0x380
        __driver_probe_device+0x80/0x170
        driver_probe_device+0x1e/0x90
        __device_attach_driver+0x7d/0x100
        ? driver_allows_async_probing+0x60/0x60
        ? driver_allows_async_probing+0x60/0x60
        bus_for_each_drv+0x7b/0xc0
        __device_attach+0xbc/0x200
        bus_probe_device+0x87/0xa0
        device_add+0x404/0x940
        ? dev_set_name+0x53/0x70
        __auxiliary_device_add+0x43/0x60
        add_adev+0x99/0xe0 [mlx5_core]
        mlx5_attach_device+0xc8/0x120 [mlx5_core]
        mlx5_load_one_devl_locked+0xb2/0xe0 [mlx5_core]
        devlink_reload+0x133/0x250
        devlink_nl_cmd_reload+0x480/0x570
        ? devlink_nl_pre_doit+0x44/0x2b0
        genl_family_rcv_msg_doit.isra.0+0xc2/0x110
        genl_rcv_msg+0x180/0x2b0
        ? devlink_nl_cmd_region_read_dumpit+0x540/0x540
        ? devlink_reload+0x250/0x250
        ? devlink_put+0x50/0x50
        ? genl_family_rcv_msg_doit.isra.0+0x110/0x110
        netlink_rcv_skb+0x54/0x100
        genl_rcv+0x24/0x40
        netlink_unicast+0x1f6/0x2c0
        netlink_sendmsg+0x237/0x490
        sock_sendmsg+0x33/0x40
        __sys_sendto+0x103/0x160
        ? handle_mm_fault+0x10e/0x290
        ? do_user_addr_fault+0x1c0/0x5f0
        __x64_sys_sendto+0x25/0x30
        do_syscall_64+0x3d/0x90
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fix it by setting port_num to 1 in order to get device status and remove
      unused variable.
      
      Fixes: aac4492e
      
       ("IB/mlx5: Update counter implementation for dual port RoCE")
      Link: https://lore.kernel.org/r/98b82994c3cd3fa593b8a75ed3f3901e208beb0f.1672231736.git.leonro@nvidia.com
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarPatrisious Haddad <phaddad@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8d89870d
    • Miaoqian Lin's avatar
      net: phy: xgmiitorgmii: Fix refcount leak in xgmiitorgmii_probe · 4d112f00
      Miaoqian Lin authored
      [ Upstream commit d0395358 ]
      
      of_phy_find_device() return device node with refcount incremented.
      Call put_device() to relese it when not needed anymore.
      
      Fixes: ab4e6ee5
      
       ("net: phy: xgmiitorgmii: Check phy_driver ready before accessing")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4d112f00
    • David Arinzon's avatar
      net: ena: Update NUMA TPH hint register upon NUMA node update · e5fbeb3d
      David Arinzon authored
      [ Upstream commit a8ee104f ]
      
      The device supports a PCIe optimization hint, which indicates on
      which NUMA the queue is currently processed. This hint is utilized
      by PCIe in order to reduce its access time by accessing the
      correct NUMA resources and maintaining cache coherence.
      
      The driver calls the register update for the hint (called TPH -
      TLP Processing Hint) during the NAPI loop.
      
      Though the update is expected upon a NUMA change (when a queue
      is moved from one NUMA to the other), the current logic performs
      a register update when the queue is moved to a different CPU,
      but the CPU is not necessarily in a different NUMA.
      
      The changes include:
      1. Performing the TPH update only when the queue has switched
      a NUMA node.
      2. Moving the TPH update call to be triggered only when NAPI was
      scheduled from interrupt context, as opposed to a busy-polling loop.
      This is due to the fact that during busy-polling, the frequency
      of CPU switches for a particular queue is significantly higher,
      thus, the likelihood to switch NUMA is much higher. Therefore,
      providing the frequent updates to the device upon a NUMA update
      are unlikely to be beneficial.
      
      Fixes: 1738cd3e
      
       ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e5fbeb3d
    • David Arinzon's avatar
      net: ena: Set default value for RX interrupt moderation · 7840b93c
      David Arinzon authored
      [ Upstream commit e712f3e4 ]
      
      RX ring can be NULL in XDP use cases where only TX queues
      are configured. In this scenario, the RX interrupt moderation
      value sent to the device remains in its default value of 0.
      
      In this change, setting the default value of the RX interrupt
      moderation to be the same as of the TX.
      
      Fixes: 548c4940
      
       ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7840b93c
    • David Arinzon's avatar
      net: ena: Fix rx_copybreak value update · d09b7a9d
      David Arinzon authored
      [ Upstream commit c7062aae ]
      
      Make the upper bound on rx_copybreak tighter, by
      making sure it is smaller than the minimum of mtu and
      ENA_PAGE_SIZE. With the current upper bound of mtu,
      rx_copybreak can be larger than a page. Such large
      rx_copybreak will not bring any performance benefit to
      the user and therefore makes no sense.
      
      In addition, the value update was only reflected in
      the adapter structure, but not applied for each ring,
      causing it to not take effect.
      
      Fixes: 1738cd3e
      
       ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
      Signed-off-by: default avatarOsama Abboud <osamaabb@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d09b7a9d
    • David Arinzon's avatar
      net: ena: Use bitmask to indicate packet redirection · 0e7ad9b0
      David Arinzon authored
      [ Upstream commit 59811faa ]
      
      Redirecting packets with XDP Redirect is done in two phases:
      1. A packet is passed by the driver to the kernel using
         xdp_do_redirect().
      2. After finishing polling for new packets the driver lets the kernel
         know that it can now process the redirected packet using
         xdp_do_flush_map().
         The packets' redirection is handled in the napi context of the
         queue that called xdp_do_redirect()
      
      To avoid calling xdp_do_flush_map() each time the driver first checks
      whether any packets were redirected, using
      	xdp_flags |= xdp_verdict;
      and
      	if (xdp_flags & XDP_REDIRECT)
      	    xdp_do_flush_map()
      
      essentially treating XDP instructions as a bitmask, which isn't the case:
          enum xdp_action {
      	    XDP_ABORTED = 0,
      	    XDP_DROP,
      	    XDP_PASS,
      	    XDP_TX,
      	    XDP_REDIRECT,
          };
      
      Given the current possible values of xdp_action, the current design
      doesn't have a bug (since XDP_REDIRECT = 100b), but it is still
      flawed.
      
      This patch makes the driver use a bitmask instead, to avoid future
      issues.
      
      Fixes: a318c70a
      
       ("net: ena: introduce XDP redirect implementation")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0e7ad9b0
    • David Arinzon's avatar
      net: ena: Account for the number of processed bytes in XDP · 5d496498
      David Arinzon authored
      [ Upstream commit c7f5e34d ]
      
      The size of packets that were forwarded or dropped by XDP wasn't added
      to the total processed bytes statistic.
      
      Fixes: 548c4940
      
       ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5d496498
    • David Arinzon's avatar
      net: ena: Don't register memory info on XDP exchange · f17d9aec
      David Arinzon authored
      [ Upstream commit 9c9e5399 ]
      
      Since the queues aren't destroyed when we only exchange XDP programs,
      there's no need to re-register them again.
      
      Fixes: 548c4940
      
       ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f17d9aec
    • David Arinzon's avatar
      net: ena: Fix toeplitz initial hash value · a4aa727a
      David Arinzon authored
      [ Upstream commit 332b49ff ]
      
      On driver initialization, RSS hash initial value is set to zero,
      instead of the default value. This happens because we pass NULL as
      the RSS key parameter, which caused us to never initialize
      the RSS hash value.
      
      This patch fixes it by making sure the initial value is set, no matter
      what the value of the RSS key is.
      
      Fixes: 91a65b7d
      
       ("net: ena: fix potential crash when rxfh key is NULL")
      Signed-off-by: default avatarNati Koler <nkoler@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a4aa727a
    • Jiguang Xiao's avatar
      net: amd-xgbe: add missed tasklet_kill · 0bec17f1
      Jiguang Xiao authored
      [ Upstream commit d530ece7 ]
      
      The driver does not call tasklet_kill in several places.
      Add the calls to fix it.
      
      Fixes: 85b85c85
      
       ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
      Signed-off-by: default avatarJiguang Xiao <jiguang.xiao@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0bec17f1
    • Adham Faris's avatar
      net/mlx5e: Fix hw mtu initializing at XDP SQ allocation · cb2f7468
      Adham Faris authored
      [ Upstream commit 1e267ab8 ]
      
      Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or
      mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to
      hw mtu (configured at xdp sq allocation) before xmiting it. This check
      does not account for ethernet fcs length (calculated and filled by the
      nic). Hence, when we try sending packets with length > (hw-mtu -
      ethernet-fcs-size), the device port drops it and tx_errors_phy is
      incremented. Desired behavior is to catch these packets and drop them
      by the driver.
      
      Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by
      subtracting ethernet FCS header size (4 Bytes) from current hw mtu
      value, since ethernet FCS is calculated and written to ethernet frames
      by the nic.
      
      Fixes: d8bec2b2
      
       ("net/mlx5e: Support bpf_xdp_adjust_head()")
      Signed-off-by: default avatarAdham Faris <afaris@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cb2f7468
    • Chris Mi's avatar
      net/mlx5e: Always clear dest encap in neigh-update-del · 6c72abb7
      Chris Mi authored
      [ Upstream commit 2951b2e1 ]
      
      The cited commit introduced a bug for multiple encapsulations flow.
      If one dest encap becomes invalid, the flow is set slow path flag.
      But when other dests encap become invalid, they are not cleared due
      to slow path flag of the flow. When neigh-update-add is running, it
      will use invalid encap.
      
      Fix it by checking slow path flag after clearing dest encap.
      
      Fixes: 9a5f9cc7
      
       ("net/mlx5e: Fix possible use-after-free deleting fdb rule")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6c72abb7
    • Roi Dayan's avatar
      net/mlx5e: TC, Refactor mlx5e_tc_add_flow_mod_hdr() to get flow attr · b36783bc
      Roi Dayan authored
      [ Upstream commit ff993167
      
       ]
      
      In later commit we are going to instantiate multiple attr instances
      for flow instead of single attr.
      Make sure mlx5e_tc_add_flow_mod_hdr() use the correct attr and not flow->attr.
      
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Stable-dep-of: 2951b2e1
      
       ("net/mlx5e: Always clear dest encap in neigh-update-del")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b36783bc
    • Dragos Tatulea's avatar
      net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default · f8c10eeb
      Dragos Tatulea authored
      [ Upstream commit b12d581e ]
      
      mlx5e_build_nic_params will turn CQE compression on if the hardware
      capability is enabled and the slow_pci_heuristic condition is detected.
      As IPoIB doesn't support CQE compression, make sure to disable the
      feature in the IPoIB profile init.
      
      Please note that the feature is not exposed to the user for IPoIB
      interfaces, so it can't be subsequently turned on.
      
      Fixes: b797a684
      
       ("net/mlx5e: Enable CQE compression when PCI is slower than link")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f8c10eeb
    • Shay Drory's avatar
      net/mlx5: Avoid recovery in probe flows · 7227bbb7
      Shay Drory authored
      [ Upstream commit 9078e843 ]
      
      Currently, recovery is done without considering whether the device is
      still in probe flow.
      This may lead to recovery before device have finished probed
      successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
      using functionality that is loaded only by mlx5_init_one(), and there
      is no point in running recovery without mlx5_init_one() finished
      successfully.
      
      Fix it by waiting for probe flow to finish and checking whether the
      device is probed before trying to perform recovery.
      
      Fixes: 51d138c2
      
       ("net/mlx5: Fix health error state handling")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7227bbb7
    • Jiri Pirko's avatar
      net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path · 9369b9af
      Jiri Pirko authored
      [ Upstream commit 2a35b2c2 ]
      
      There are two cleanup calls missing in mlx5_init_once() error path.
      Add them making the error path flow to be the same as
      mlx5_cleanup_once().
      
      Fixes: 52ec462e ("net/mlx5: Add reserved-gids support")
      Fixes: 7c39afb3
      
       ("net/mlx5: PTP code migration to driver core section")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9369b9af
    • Moshe Shemesh's avatar
      net/mlx5: E-Switch, properly handle ingress tagged packets on VST · d966f2ee
      Moshe Shemesh authored
      [ Upstream commit 1f0ae22a ]
      
      Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
      present in the frame. Previous VST mode behavior was to drop packets or
      override existing tag, depending on the device version.
      
      In this patch we fix this behavior by correctly building the HW steering
      rule with a push vlan action, or for older devices we ask the FW to stack
      the vlan when a vlan is already present.
      
      Fixes: 07bab950 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
      Fixes: dfcb1ed3
      
       ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d966f2ee
    • Stefano Garzarella's avatar
      vdpa_sim: fix vringh initialization in vdpasim_queue_ready() · 6a37a01a
      Stefano Garzarella authored
      [ Upstream commit 794ec498 ]
      
      When we initialize vringh, we should pass the features and the
      number of elements in the virtqueue negotiated with the driver,
      otherwise operations with vringh may fail.
      
      This was discovered in a case where the driver sets a number of
      elements in the virtqueue different from the value returned by
      .get_vq_num_max().
      
      In vdpasim_vq_reset() is safe to initialize the vringh with
      default values, since the virtqueue will not be used until
      vdpasim_queue_ready() is called again.
      
      Fixes: 2c53d0f6
      
       ("vdpasim: vDPA device simulator")
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-Id: <20221110141335.62171-1-sgarzare@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarEugenio Pérez <eperezma@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a37a01a
    • Stefano Garzarella's avatar
      vhost: fix range used in translate_desc() · e3462410
      Stefano Garzarella authored
      [ Upstream commit 98047313 ]
      
      vhost_iotlb_itree_first() requires `start` and `last` parameters
      to search for a mapping that overlaps the range.
      
      In translate_desc() we cyclically call vhost_iotlb_itree_first(),
      incrementing `addr` by the amount already translated, so rightly
      we move the `start` parameter passed to vhost_iotlb_itree_first(),
      but we should hold the `last` parameter constant.
      
      Let's fix it by saving the `last` parameter value before incrementing
      `addr` in the loop.
      
      Fixes: a9709d68
      
       ("vhost: convert pre sorted vhost memory array to interval tree")
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-Id: <20221109102503.18816-3-sgarzare@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e3462410
    • Stefano Garzarella's avatar
      vringh: fix range used in iotlb_translate() · 13871f60
      Stefano Garzarella authored
      [ Upstream commit f85efa9b ]
      
      vhost_iotlb_itree_first() requires `start` and `last` parameters
      to search for a mapping that overlaps the range.
      
      In iotlb_translate() we cyclically call vhost_iotlb_itree_first(),
      incrementing `addr` by the amount already translated, so rightly
      we move the `start` parameter passed to vhost_iotlb_itree_first(),
      but we should hold the `last` parameter constant.
      
      Let's fix it by saving the `last` parameter value before incrementing
      `addr` in the loop.
      
      Fixes: 9ad9c49c
      
       ("vringh: IOTLB support")
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-Id: <20221109102503.18816-2-sgarzare@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      13871f60
    • Yuan Can's avatar
      vhost/vsock: Fix error handling in vhost_vsock_init() · e05d4c8c
      Yuan Can authored
      [ Upstream commit 7a4efe18 ]
      
      A problem about modprobe vhost_vsock failed is triggered with the
      following log given:
      
      modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy
      
      The reason is that vhost_vsock_init() returns misc_register() directly
      without checking its return value, if misc_register() failed, it returns
      without calling vsock_core_unregister() on vhost_transport, resulting the
      vhost_vsock can never be installed later.
      A simple call graph is shown as below:
      
       vhost_vsock_init()
         vsock_core_register() # register vhost_transport
         misc_register()
           device_create_with_groups()
             device_create_groups_vargs()
               dev = kzalloc(...) # OOM happened
         # return without unregister vhost_transport
      
      Fix by calling vsock_core_unregister() when misc_register() returns error.
      
      Fixes: 433fc58e
      
       ("VSOCK: Introduce vhost_vsock.ko")
      Signed-off-by: default avatarYuan Can <yuancan@huawei.com>
      Message-Id: <20221108101705.45981-1-yuancan@huawei.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e05d4c8c
    • ruanjinjie's avatar
      vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init() · 586e6fd7
      ruanjinjie authored
      [ Upstream commit aeca7ff2 ]
      
      Inject fault while probing module, if device_register() fails in
      vdpasim_net_init() or vdpasim_blk_init(), but the refcount of kobject is
      not decreased to 0, the name allocated in dev_set_name() is leaked.
      Fix this by calling put_device(), so that name can be freed in
      callback function kobject_cleanup().
      
      (vdpa_sim_net)
      unreferenced object 0xffff88807eebc370 (size 16):
        comm "modprobe", pid 3848, jiffies 4362982860 (age 18.153s)
        hex dump (first 16 bytes):
          76 64 70 61 73 69 6d 5f 6e 65 74 00 6b 6b 6b a5  vdpasim_net.kkk.
        backtrace:
          [<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150
          [<ffffffff81731d53>] kstrdup+0x33/0x60
          [<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110
          [<ffffffff82d87aab>] dev_set_name+0xab/0xe0
          [<ffffffff82d91a23>] device_add+0xe3/0x1a80
          [<ffffffffa0270013>] 0xffffffffa0270013
          [<ffffffff81001c27>] do_one_initcall+0x87/0x2e0
          [<ffffffff813739cb>] do_init_module+0x1ab/0x640
          [<ffffffff81379d20>] load_module+0x5d00/0x77f0
          [<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0
          [<ffffffff83c4d505>] do_syscall_64+0x35/0x80
          [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      (vdpa_sim_blk)
      unreferenced object 0xffff8881070c1250 (size 16):
        comm "modprobe", pid 6844, jiffies 4364069319 (age 17.572s)
        hex dump (first 16 bytes):
          76 64 70 61 73 69 6d 5f 62 6c 6b 00 6b 6b 6b a5  vdpasim_blk.kkk.
        backtrace:
          [<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150
          [<ffffffff81731d53>] kstrdup+0x33/0x60
          [<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110
          [<ffffffff82d87aab>] dev_set_name+0xab/0xe0
          [<ffffffff82d91a23>] device_add+0xe3/0x1a80
          [<ffffffffa0220013>] 0xffffffffa0220013
          [<ffffffff81001c27>] do_one_initcall+0x87/0x2e0
          [<ffffffff813739cb>] do_init_module+0x1ab/0x640
          [<ffffffff81379d20>] load_module+0x5d00/0x77f0
          [<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0
          [<ffffffff83c4d505>] do_syscall_64+0x35/0x80
          [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: 899c4d18 ("vdpa_sim_blk: add support for vdpa management tool")
      Fixes: a3c06ae1
      
       ("vdpa_sim_net: Add support for user supported devices")
      
      Signed-off-by: default avatarruanjinjie <ruanjinjie@huawei.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-Id: <20221110082348.4105476-1-ruanjinjie@huawei.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      586e6fd7
    • Miaoqian Lin's avatar
      nfc: Fix potential resource leaks · b63bc2db
      Miaoqian Lin authored
      [ Upstream commit df49908f ]
      
      nfc_get_device() take reference for the device, add missing
      nfc_put_device() to release it when not need anymore.
      Also fix the style warnning by use error EOPNOTSUPP instead of
      ENOTSUPP.
      
      Fixes: 5ce3f32b ("NFC: netlink: SE API implementation")
      Fixes: 29e76924
      
       ("nfc: netlink: Add capability to reply to vendor_cmd with data")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b63bc2db
    • Johnny S. Lee's avatar
      net: dsa: mv88e6xxx: depend on PTP conditionally · 945e58bd
      Johnny S. Lee authored
      [ Upstream commit 30e72553 ]
      
      PTP hardware timestamping related objects are not linked when PTP
      support for MV88E6xxx (NET_DSA_MV88E6XXX_PTP) is disabled, therefore
      NET_DSA_MV88E6XXX should not depend on PTP_1588_CLOCK_OPTIONAL
      regardless of NET_DSA_MV88E6XXX_PTP.
      
      Instead, condition more strictly on how NET_DSA_MV88E6XXX_PTP's
      dependencies are met, making sure that it cannot be enabled when
      NET_DSA_MV88E6XXX=y and PTP_1588_CLOCK=m.
      
      In other words, this commit allows NET_DSA_MV88E6XXX to be built-in
      while PTP_1588_CLOCK is a module, as long as NET_DSA_MV88E6XXX_PTP is
      prevented from being enabled.
      
      Fixes: e5f31552
      
       ("ethernet: fix PTP_1588_CLOCK dependencies")
      Signed-off-by: default avatarJohnny S. Lee <foss@jsl.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      945e58bd
    • Daniil Tatianin's avatar
      qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure · 95df720e
      Daniil Tatianin authored
      [ Upstream commit 13a7c896 ]
      
      adapter->dcb would get silently freed inside qlcnic_dcb_enable() in
      case qlcnic_dcb_attach() would return an error, which always happens
      under OOM conditions. This would lead to use-after-free because both
      of the existing callers invoke qlcnic_dcb_get_info() on the obtained
      pointer, which is potentially freed at that point.
      
      Propagate errors from qlcnic_dcb_enable(), and instead free the dcb
      pointer at callsite using qlcnic_dcb_free(). This also removes the now
      unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around
      kfree() also causing memory leaks for partially initialized dcb.
      
      Found by Linux Verification Center (linuxtesting.org) with the SVACE
      static analysis tool.
      
      Fixes: 3c44bba1
      
       ("qlcnic: Disable DCB operations from SR-IOV VFs")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarDaniil Tatianin <d-tatianin@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      95df720e
    • Hawkins Jiawei's avatar
      net: sched: fix memory leak in tcindex_set_parms · 6c55953e
      Hawkins Jiawei authored
      [ Upstream commit 399ab7fe ]
      
      Syzkaller reports a memory leak as follows:
      ====================================
      BUG: memory leak
      unreferenced object 0xffff88810c287f00 (size 256):
        comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
          [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
          [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
          [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
          [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
          [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
          [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
          [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
          [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
          [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
          [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
          [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
          [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
          [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
          [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
          [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
          [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
          [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
          [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
          [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
          [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
          [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      ====================================
      
      Kernel uses tcindex_change() to change an existing
      filter properties.
      
      Yet the problem is that, during the process of changing,
      if `old_r` is retrieved from `p->perfect`, then
      kernel uses tcindex_alloc_perfect_hash() to newly
      allocate filter results, uses tcindex_filter_result_init()
      to clear the old filter result, without destroying
      its tcf_exts structure, which triggers the above memory leak.
      
      To be more specific, there are only two source for the `old_r`,
      according to the tcindex_lookup(). `old_r` is retrieved from
      `p->perfect`, or `old_r` is retrieved from `p->h`.
      
        * If `old_r` is retrieved from `p->perfect`, kernel uses
      tcindex_alloc_perfect_hash() to newly allocate the
      filter results. Then `r` is assigned with `cp->perfect + handle`,
      which is newly allocated. So condition `old_r && old_r != r` is
      true in this situation, and kernel uses tcindex_filter_result_init()
      to clear the old filter result, without destroying
      its tcf_exts structure
      
        * If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL
      according to the tcindex_lookup(). Considering that `cp->h`
      is directly copied from `p->h` and `p->perfect` is NULL,
      `r` is assigned with `tcindex_lookup(cp, handle)`, whose value
      should be the same as `old_r`, so condition `old_r && old_r != r`
      is false in this situation, kernel ignores using
      tcindex_filter_result_init() to clear the old filter result.
      
      So only when `old_r` is retrieved from `p->perfect` does kernel use
      tcindex_filter_result_init() to clear the old filter result, which
      triggers the above memory leak.
      
      Considering that there already exists a tc_filter_wq workqueue
      to destroy the old tcindex_data by tcindex_partial_destroy_work()
      at the end of tcindex_set_parms(), this patch solves
      this memory leak bug by removing this old filter result
      clearing part and delegating it to the tc_filter_wq workqueue.
      
      Note that this patch doesn't introduce any other issues. If
      `old_r` is retrieved from `p->perfect`, this patch just
      delegates old filter result clearing part to the
      tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`,
      kernel doesn't reach the old filter result clearing part, so
      removing this part has no effect.
      
      [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
      and Dmitry Vyukov]
      
      Fixes: b9a24bb7
      
       ("net_sched: properly handle failure case of tcf_exts_init()")
      Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
      Reported-by: default avatar <syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com>
      Tested-by: default avatar <syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com>
      Cc: Cong Wang <cong.wang@bytedance.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6c55953e
    • Jian Shen's avatar
      net: hns3: fix VF promisc mode not update when mac table full · d14a4b24
      Jian Shen authored
      [ Upstream commit 8ee57c7b ]
      
      Currently, it missed set HCLGE_VPORT_STATE_PROMISC_CHANGE
      flag for VF when vport->overflow_promisc_flags changed.
      So the VF won't check whether to update promisc mode in
      this case. So add it.
      
      Fixes: 1e6e7610
      
       ("net: hns3: configure promisc mode for VF asynchronously")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarHao Lan <lanhao@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d14a4b24
    • Jian Shen's avatar
      net: hns3: fix miss L3E checking for rx packet · 7ed205b9
      Jian Shen authored
      [ Upstream commit 7d89b53c ]
      
      For device supports RXD advanced layout, the driver will
      return directly if the hardware finish the checksum
      calculate. It cause missing L3E checking for ip packets.
      Fixes it.
      
      Fixes: 1ddc028a
      
       ("net: hns3: refactor out RX completion checksum")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarHao Lan <lanhao@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7ed205b9
    • Peng Li's avatar
      net: hns3: extract macro to simplify ring stats update code · 47868cb7
      Peng Li authored
      [ Upstream commit e6d72f6a
      
       ]
      
      As the code to update ring stats is alike for different ring stats
      type, this patch extract macro to simplify ring stats update code.
      
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: 7d89b53c
      
       ("net: hns3: fix miss L3E checking for rx packet")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47868cb7
    • Hao Chen's avatar
      net: hns3: refactor hns3_nic_reuse_page() · 7457c5a7
      Hao Chen authored
      [ Upstream commit e74a726d
      
       ]
      
      Split rx copybreak handle into a separate function from function
      hns3_nic_reuse_page() to improve code simplicity.
      
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: 7d89b53c
      
       ("net: hns3: fix miss L3E checking for rx packet")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7457c5a7
    • Jie Wang's avatar
      net: hns3: add interrupts re-initialization while doing VF FLR · 4a6e9fb5
      Jie Wang authored
      [ Upstream commit 09e6b30e ]
      
      Currently keep alive message between PF and VF may be lost and the VF is
      unalive in PF. So the VF will not do reset during PF FLR reset process.
      This would make the allocated interrupt resources of VF invalid and VF
      would't receive or respond to PF any more.
      
      So this patch adds VF interrupts re-initialization during VF FLR for VF
      recovery in above cases.
      
      Fixes: 862d969a
      
       ("net: hns3: do VF's pci re-initialization while PF doing FLR")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarHao Lan <lanhao@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4a6e9fb5
    • Jeff Layton's avatar
      nfsd: shut down the NFSv4 state objects before the filecache · 5e48ed80
      Jeff Layton authored
      [ Upstream commit 789e1e10
      
       ]
      
      Currently, we shut down the filecache before trying to clean up the
      stateids that depend on it. This leads to the kernel trying to free an
      nfsd_file twice, and a refcount overput on the nf_mark.
      
      Change the shutdown procedure to tear down all of the stateids prior
      to shutting down the filecache.
      
      Reported-and-tested-by: default avatarWang Yugui <wangyugui@e16-tech.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Fixes: 5e113224
      
       ("nfsd: nfsd_file cache entries should be per net namespace")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5e48ed80
    • Shawn Bohrer's avatar
      veth: Fix race with AF_XDP exposing old or uninitialized descriptors · 7e2825f5
      Shawn Bohrer authored
      [ Upstream commit fa349e39 ]
      
      When AF_XDP is used on on a veth interface the RX ring is updated in two
      steps.  veth_xdp_rcv() removes packet descriptors from the FILL ring
      fills them and places them in the RX ring updating the cached_prod
      pointer.  Later xdp_do_flush() syncs the RX ring prod pointer with the
      cached_prod pointer allowing user-space to see the recently filled in
      descriptors.  The rings are intended to be SPSC, however the existing
      order in veth_poll allows the xdp_do_flush() to run concurrently with
      another CPU creating a race condition that allows user-space to see old
      or uninitialized descriptors in the RX ring.  This bug has been observed
      in production systems.
      
      To summarize, we are expecting this ordering:
      
      CPU 0 __xsk_rcv_zc()
      CPU 0 __xsk_map_flush()
      CPU 2 __xsk_rcv_zc()
      CPU 2 __xsk_map_flush()
      
      But we are seeing this order:
      
      CPU 0 __xsk_rcv_zc()
      CPU 2 __xsk_rcv_zc()
      CPU 0 __xsk_map_flush()
      CPU 2 __xsk_map_flush()
      
      This occurs because we rely on NAPI to ensure that only one napi_poll
      handler is running at a time for the given veth receive queue.
      napi_schedule_prep() will prevent multiple instances from getting
      scheduled. However calling napi_complete_done() signals that this
      napi_poll is complete and allows subsequent calls to
      napi_schedule_prep() and __napi_schedule() to succeed in scheduling a
      concurrent napi_poll before the xdp_do_flush() has been called.  For the
      veth driver a concurrent call to napi_schedule_prep() and
      __napi_schedule() can occur on a different CPU because the veth xmit
      path can additionally schedule a napi_poll creating the race.
      
      The fix as suggested by Magnus Karlsson, is to simply move the
      xdp_do_flush() call before napi_complete_done().  This syncs the
      producer ring pointers before another instance of napi_poll can be
      scheduled on another CPU.  It will also slightly improve performance by
      moving the flush closer to when the descriptors were placed in the
      RX ring.
      
      Fixes: d1396004
      
       ("veth: Add XDP TX and REDIRECT")
      Suggested-by: default avatarMagnus Karlsson <magnus.karlsson@gmail.com>
      Signed-off-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7e2825f5
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: honor set timeout and garbage collection updates · ac95cdaf
      Pablo Neira Ayuso authored
      [ Upstream commit 123b9961 ]
      
      Set timeout and garbage collection interval updates are ignored on
      updates. Add transaction to update global set element timeout and
      garbage collection interval.
      
      Fixes: 96518518
      
       ("netfilter: add nftables")
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ac95cdaf
    • Ronak Doshi's avatar
      vmxnet3: correctly report csum_level for encapsulated packet · 49677ea1
      Ronak Doshi authored
      [ Upstream commit 3d8f2c42 ]
      
      Commit dacce2be ("vmxnet3: add geneve and vxlan tunnel offload
      support") added support for encapsulation offload. However, the
      pathc did not report correctly the csum_level for encapsulated packet.
      
      This patch fixes this issue by reporting correct csum level for the
      encapsulated packet.
      
      Fixes: dacce2be
      
       ("vmxnet3: add geneve and vxlan tunnel offload support")
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Acked-by: default avatarPeng Li <lpeng@vmware.com>
      Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      49677ea1
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: perform type checking for existing sets · 9d30cb44
      Pablo Neira Ayuso authored
      [ Upstream commit f6594c37 ]
      
      If a ruleset declares a set name that matches an existing set in the
      kernel, then validate that this declaration really refers to the same
      set, otherwise bail out with EEXIST.
      
      Currently, the kernel reports success when adding a set that already
      exists in the kernel. This usually results in EINVAL errors at a later
      stage, when the user adds elements to the set, if the set declaration
      mismatches the existing set representation in the kernel.
      
      Add a new function to check that the set declaration really refers to
      the same existing set in the kernel.
      
      Fixes: 96518518
      
       ("netfilter: add nftables")
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9d30cb44
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add function to create set stateful expressions · c3bfb778
      Pablo Neira Ayuso authored
      [ Upstream commit a8fe4154
      
       ]
      
      Add a helper function to allocate and initialize the stateful expressions
      that are defined in a set.
      
      This patch allows to reuse this code from the set update path, to check
      that type of the update matches the existing set in the kernel.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Stable-dep-of: f6594c37
      
       ("netfilter: nf_tables: perform type checking for existing sets")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c3bfb778