Skip to content
  1. Mar 03, 2022
    • Petr Machata's avatar
      net: rtnetlink: Add UAPI for obtaining L3 offload xstats · 0e7788fd
      Petr Machata authored
      
      
      Add a new IFLA_STATS_LINK_OFFLOAD_XSTATS child attribute,
      IFLA_OFFLOAD_XSTATS_L3_STATS, to carry statistics for traffic that takes
      place in a HW router.
      
      The offloaded HW stats are designed to allow per-netdevice enablement and
      disablement. Additionally, as a netdevice is configured, it may become or
      cease being suitable for binding of a HW counter. Both of these aspects
      need to be communicated to the userspace. To that end, add another child
      attribute, IFLA_OFFLOAD_XSTATS_HW_S_INFO:
      
          - attr nest IFLA_OFFLOAD_XSTATS_HW_S_INFO
      	- attr nest IFLA_OFFLOAD_XSTATS_L3_STATS
       	    - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST
      	      - {0,1} as u8
       	    - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED
      	      - {0,1} as u8
      
      Thus this one attribute is a nest that can be used to carry information
      about various types of HW statistics, and indexing is very simply done by
      wrapping the information for a given statistics suite into the attribute
      that carries the suite is the RTM_GETSTATS query. At the same time, because
      _HW_S_INFO is nested directly below IFLA_STATS_LINK_OFFLOAD_XSTATS, it is
      possible through filtering to request only the metadata about individual
      statistics suites, without having to hit the HW to get the actual counters.
      
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e7788fd
    • Petr Machata's avatar
      net: dev: Add hardware stats support · 9309f97a
      Petr Machata authored
      
      
      Offloading switch device drivers may be able to collect statistics of the
      traffic taking place in the HW datapath that pertains to a certain soft
      netdevice, such as VLAN. Add the necessary infrastructure to allow exposing
      these statistics to the offloaded netdevice in question. The API was shaped
      by the following considerations:
      
      - Collection of HW statistics is not free: there may be a finite number of
        counters, and the act of counting may have a performance impact. It is
        therefore necessary to allow toggling whether HW counting should be done
        for any particular SW netdevice.
      
      - As the drivers are loaded and removed, a particular device may get
        offloaded and unoffloaded again. At the same time, the statistics values
        need to stay monotonic (modulo the eventual 64-bit wraparound),
        increasing only to reflect traffic measured in the device.
      
        To that end, the netdevice keeps around a lazily-allocated copy of struct
        rtnl_link_stats64. Device drivers then contribute to the values kept
        therein at various points. Even as the driver goes away, the struct stays
        around to maintain the statistics values.
      
      - Different HW devices may be able to count different things. The
        motivation behind this patch in particular is exposure of HW counters on
        Nvidia Spectrum switches, where the only practical approach to counting
        traffic on offloaded soft netdevices currently is to use router interface
        counters, and count L3 traffic. Correspondingly that is the statistics
        suite added in this patch.
      
        Other devices may be able to measure different kinds of traffic, and for
        that reason, the APIs are built to allow uniform access to different
        statistics suites.
      
      - Because soft netdevices and offloading drivers are only loosely bound, a
        netdevice uses a notifier chain to communicate with the drivers. Several
        new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages
        to the offloading drivers.
      
      - Devices can have various conditions for when a particular counter is
        available. As the device is configured and reconfigured, the device
        offload may become or cease being suitable for counter binding. A
        netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to
        ping offloading drivers and determine whether anyone currently implements
        a given statistics suite. This information can then be propagated to user
        space.
      
        When the driver decides to unoffload a netdevice, it can use a
        newly-added function, netdev_offload_xstats_report_delta(), to record
        outstanding collected statistics, before destroying the HW counter.
      
      This patch adds a helper, call_netdevice_notifiers_info_robust(), for
      dispatching a notifier with the possibility of unwind when one of the
      consumers bails. Given the wish to eventually get rid of the global
      notifier block altogether, this helper only invokes the per-netns notifier
      block.
      
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9309f97a
    • Petr Machata's avatar
      net: rtnetlink: rtnl_fill_statsinfo(): Permit non-EMSGSIZE error returns · 216e6906
      Petr Machata authored
      
      
      Obtaining stats for the IFLA_STATS_LINK_OFFLOAD_XSTATS nest involves a HW
      access, and can fail for more reasons than just netlink message size
      exhaustion. Therefore do not always return -EMSGSIZE on the failure path,
      but respect the error code provided by the callee. Set the error explicitly
      where it is reasonable to assume -EMSGSIZE as the failure reason.
      
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      216e6906
    • Petr Machata's avatar
      net: rtnetlink: Propagate extack to rtnl_offload_xstats_fill() · 05415bcc
      Petr Machata authored
      
      
      Later patches add handlers for more HW-backed statistics. An extack will be
      useful when communicating HW / driver errors to the client. Add the
      arguments as appropriate.
      
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05415bcc
    • Petr Machata's avatar
      net: rtnetlink: RTM_GETSTATS: Allow filtering inside nests · 46efc97b
      Petr Machata authored
      
      
      The filter_mask field of RTM_GETSTATS header determines which top-level
      attributes should be included in the netlink response. This saves
      processing time by only including the bits that the user cares about
      instead of always dumping everything. This is doubly important for
      HW-backed statistics that would typically require a trip to the device to
      fetch the stats.
      
      So far there was only one HW-backed stat suite per attribute. However,
      IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in
      the following patches. It would therefore be advantageous to be able to
      filter within that nest, and select just one or the other HW-backed
      statistics suite.
      
      Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload.
      The scheme is as follows:
      
          - RTM_GETSTATS
      	- struct if_stats_msg
      	- attr nest IFLA_STATS_GET_FILTERS
      	    - attr IFLA_STATS_LINK_OFFLOAD_XSTATS
      		- u32 filter_mask
      
      This scheme reuses the existing enumerators by nesting them in a dedicated
      context attribute. This is covered by policies as usual, therefore a
      gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS
      nest has filtering enabled, because for the SW counters the issue does not
      seem to be that important.
      
      rtnl_offload_xstats_get_size() and _fill() are extended to observe the
      requested filters.
      
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46efc97b
    • Petr Machata's avatar
      net: rtnetlink: Stop assuming that IFLA_OFFLOAD_XSTATS_* are dev-backed · f6e0fb81
      Petr Machata authored
      
      
      The IFLA_STATS_LINK_OFFLOAD_XSTATS attribute is a nest whose child
      attributes carry various special hardware statistics. The code that handles
      this nest was written with the idea that all these statistics would be
      exposed by the device driver of a physical netdevice.
      
      In the following patches, a new attribute is added to the abovementioned
      nest, which however can be defined for some soft netdevices. The NDO-based
      approach to querying these does not work, because it is not the soft
      netdevice driver that exposes these statistics, but an offloading NIC
      driver that does so.
      
      The current code does not scale well to this usage. Simply rewrite it back
      to the pattern seen in other fill-like and get_size-like functions
      elsewhere.
      
      Extract to helpers the code that is concerned with handling specifically
      NDO-backed statistics so that it can be easily reused should more such
      statistics be added.
      
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6e0fb81
    • Petr Machata's avatar
      net: rtnetlink: Namespace functions related to IFLA_OFFLOAD_XSTATS_* · 6b524a1d
      Petr Machata authored
      
      
      The currently used names rtnl_get_offload_stats() and
      rtnl_get_offload_stats_size() do not clearly show the namespace. The former
      function additionally seems to have been named this way in accordance with
      the NDO name, as opposed to the naming used in the rtnetlink.c file (and
      indeed elsewhere in the netlink handling code). As more and
      differently-flavored attributes are introduced, a common clear prefix is
      needed for all related functions.
      
      Rename the functions to follow the rtnl_offload_xstats_* naming scheme.
      
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b524a1d
    • Manish Chopra's avatar
      qed: validate and restrict untrusted VFs vlan promisc mode · cbcc44db
      Manish Chopra authored
      Today when VFs are put in promiscuous mode, they can request PF
      to configure device for them to receive all VLANs traffic regardless
      of what vlan is configured by the PF (via ip link) and PF allows this
      config request regardless of whether VF is trusted or not.
      
      From security POV, when VLAN is configured for VF through PF (via ip link),
      honour such config requests from VF only when they are configured to be
      trusted, otherwise restrict such VFs vlan promisc mode config.
      
      Cc: stable@vger.kernel.org
      Fixes: f990c82c
      
       ("qed*: Add support for ndo_set_vf_trust")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbcc44db
    • Manish Chopra's avatar
      qed: display VF trust config · 4e6e6bec
      Manish Chopra authored
      Driver does support SR-IOV VFs trust configuration but
      it does not display it when queried via ip link utility.
      
      Cc: stable@vger.kernel.org
      Fixes: f990c82c
      
       ("qed*: Add support for ndo_set_vf_trust")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e6e6bec
    • David S. Miller's avatar
      Merge branch 'stmmac-SA8155p-ADP' · d52b4536
      David S. Miller authored
      
      
      @ 2022-03-02 10:39 Bhupesh Sharma
        2022-03-02 10:39 ` [PATCH v2 1/2 net-next] net: stmmac: Add support for SM8150 Bhupesh Sharma
        2022-03-02 10:39 ` [PATCH v2 2/2 net-next] net: stmmac: dwmac-qcom-ethqos: Adjust rgmii loopback_en per platform Bhupesh Sharma
        0 siblings, 2 replies; 3+ messages in thread
      Bhupesh Sharma says:
      
      ====================
      net: stmmac: Enable support for Qualcomm SA8155p-ADP board
      
      Changes since v1:
      -----------------
      - v1 can be seen here: https://lore.kernel.org/netdev/20220126221725.710167-1-bhupesh.sharma@linaro.org/t/
      - Fixed review comments from Bjorn - broke the v1 series into two
        separate series - one each for 'net' tree and 'arm clock/dts' tree
        - so as to ease review of the same from the respective maintainers.
      - This series is intended for the 'net' tree.
      
      The SA8155p-ADP board supports on-board ethernet (Gibabit Interface),
      with support for both RGMII and RMII buses.
      
      This patchset adds the support for the same.
      
      Note that this patchset is based on an earlier sent patchset
      for adding PDC controller support on SM8150 (see [1]).
      
      [1]. https://lore.kernel.org/linux-arm-msm/20220226184028.111566-1-bhupesh.sharma@linaro.org/T/
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d52b4536
    • Bjorn Andersson's avatar
      net: stmmac: dwmac-qcom-ethqos: Adjust rgmii loopback_en per platform · a7bf6d7c
      Bjorn Andersson authored
      
      
      Not all platforms should have RGMII_CONFIG_LOOPBACK_EN and the result it
      about 50% packet loss on incoming messages. So make it possile to
      configure this per compatible and enable it for QCS404.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7bf6d7c
    • Vinod Koul's avatar
      net: stmmac: Add support for SM8150 · d90b3120
      Vinod Koul authored
      
      
      This adds compatible, POR config & driver data for ethernet controller
      found in SM8150 SoC.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      [bhsharma: Massage the commit log and other cosmetic changes]
      Signed-off-by: default avatarBhupesh Sharma <bhupesh.sharma@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d90b3120
    • David S. Miller's avatar
      Merge branch 'page_pool-stats' · a8ff736d
      David S. Miller authored
      
      
      Joe Damato says:
      
      ====================
      page_pool: Add stats counters
      
      Greetings:
      
      Welcome to v9.
      
      This revisions adds a commit which updates the page_pool documentation to
      describe the stats API, structures, and fields.
      
      Additionally, this revision contains a minor cosmetic change suggested by
      Saeed in page_pool_recycle_in_ring in commit 2: "page_pool: Add recycle
      stats", which removes an unnecessary #ifdef.
      
      There are no functional changes in this revision.
      
      Benchmark output from the v7 cover [1] is pasted below, as it is still
      relevant since no functional changes have been made in this revision:
      
      Benchmarks have been re-run. As always, results between runs are highly
      variable; you'll find results showing that stats disabled are both faster
      and slower than stats enabled in back to back benchmark runs.
      
      Raw benchmark output with stats off [2] and stats on [3] are available for
      examination.
      
      Test system:
      	- 2x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
      	- 2 NUMA zones, with 18 cores per zone and 2 threads per core
      
      bench_page_pool_simple results, loops=200000000
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      for_loop			0	0.335		0	0.336
      atomic_inc 			14	6.106		13	6.022
      lock				30	13.365		32	13.968
      
      no-softirq-page_pool01		75	32.884		74	32.308
      no-softirq-page_pool02		79	34.696		74	32.302
      no-softirq-page_pool03		110	48.005		105	46.073
      
      tasklet_page_pool01_fast_path	14	6.156		14	6.211
      tasklet_page_pool02_ptr_ring	41	18.028		39	17.391
      tasklet_page_pool03_slow	107	46.646		105	46.123
      
      bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=4:
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      page_pool_cross_cpu CPU(0)	3973	1731.596	4015	1750.015
      page_pool_cross_cpu CPU(1)	3976	1733.217	4022	1752.864
      page_pool_cross_cpu CPU(2)	3973	1731.615	4016	1750.433
      page_pool_cross_cpu CPU(3)	3976	1733.218	4021	1752.806
      page_pool_cross_cpu CPU(4)	994	433.305		1005	438.217
      
      page_pool_cross_cpu average	3378	-		3415	-
      
      bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=8:
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      page_pool_cross_cpu CPU(0)	6969	3037.488	6909	3011.463
      page_pool_cross_cpu CPU(1)	6974	3039.469	6913	3012.961
      page_pool_cross_cpu CPU(2)	6969	3037.575	6910	3011.585
      page_pool_cross_cpu CPU(3)	6974	3039.415	6913	3012.961
      page_pool_cross_cpu CPU(4)	6969	3037.288	6909	3011.368
      page_pool_cross_cpu CPU(5)	6972	3038.732	6913	3012.920
      page_pool_cross_cpu CPU(6)	6969	3037.350	6909	3011.386
      page_pool_cross_cpu CPU(7)	6973	3039.356	6913	3012.921
      page_pool_cross_cpu CPU(8)	871	379.934		864	376.620
      
      page_pool_cross_cpu average	6293	-		6239	-
      
      Thanks.
      
      [1]: https://lore.kernel.org/all/1645810914-35485-1-git-send-email-jdamato@fastly.com/
      [2]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_disabled
      [3]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_enabled
      
      v8 -> v9:
      	- Add documentation about the page_pool_get_stats API, stats
      	  structures, and fields to Documentation/networking/page_pool.rst.
      	- Remove unnecessary #ifdef in page_pool_recycle_in_ring.
      
      v7 -> v8:
      	- Rename mlx5 ethtool stats so that users have a better idea of
      	  their meaning.
      
      v6 -> v7:
      	- stats split out into two structs one single per-page pool struct
      	  for allocation path stats and one per-cpu pointer for recycle
      	  path stats.
      	- page_pool_get_stats updated to use a wrapper struct to gather
      	  stats for allocation and recycle stats with a single argument.
      	- placement of structs adjusted
      	- mlx5 driver modified to use page_pool_get_stats API
      
      v5 -> v6:
      	- Per cpu page_pool_stats struct pointer is now marked as
      	  ____cacheline_aligned_in_smp. Placement of the field in the
      	  struct is unchanged; it is the last field.
      
      v4 -> v5:
      	- Fixed the description of the kernel option in Kconfig.
      	- Squashed commits 1-10 from v4 into a single commit for easier
      	  review.
      	- Changed the comment style of the comment for
      	  the this_cpu_inc_alloc_stat macro.
      	- Changed the return type of page_pool_get_stats from struct
      	  page_pool_stat * to bool.
      
      v3 -> v4:
      	- Restructured stats to be per-cpu per-pool.
      	- Global stats and proc file were removed.
      	- Exposed an API (page_pool_get_stats) for batching the pool stats.
      
      v2 -> v3:
      	- patch 8/10 ("Add stat tracking cache refill") fixed placement of
      	  counter increment.
      	- patch 10/10 ("net-procfs: Show page pool stats in proc") updated:
      		- fix unused label warning from kernel test robot,
      		- fixed page_pool_seq_show to only display the refill stat
      		  once,
      		- added a remove_proc_entry for page_pool_stat to
      		  dev_proc_net_exit.
      
      v1 -> v2:
      	- A new kernel config option has been added, which defaults to N,
      	   preventing this code from being compiled in by default
      	- The stats structure has been converted to a per-cpu structure
      	- The stats are now exported via proc (/proc/net/page_pool_stat)
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8ff736d
    • Joe Damato's avatar
      mlx5: add support for page_pool_get_stats · cc10e84b
      Joe Damato authored
      
      
      This change adds support for the page_pool_get_stats API to mlx5. If the
      user has enabled CONFIG_PAGE_POOL_STATS in their kernel, ethtool will
      output page pool stats.
      
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Acked-by: default avatarSaeed Mahameed <saeed@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc10e84b
    • Joe Damato's avatar
      Documentation: update networking/page_pool.rst · a3dd9828
      Joe Damato authored
      
      
      Add the new stats API, kernel config parameter, and stats structure
      information to the page_pool documentation.
      
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3dd9828
    • Joe Damato's avatar
      page_pool: Add function to batch and return stats · 6b95e338
      Joe Damato authored
      
      
      Adds a function page_pool_get_stats which can be used by drivers to obtain
      stats for a specified page_pool.
      
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b95e338
    • Joe Damato's avatar
      page_pool: Add recycle stats · ad6fa1e1
      Joe Damato authored
      
      
      Add per-cpu stats tracking page pool recycling events:
      	- cached: recycling placed page in the page pool cache
      	- cache_full: page pool cache was full
      	- ring: page placed into the ptr ring
      	- ring_full: page released from page pool because the ptr ring was full
      	- released_refcnt: page released (and not recycled) because refcnt > 1
      
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad6fa1e1
    • Joe Damato's avatar
      page_pool: Add allocation stats · 8610037e
      Joe Damato authored
      
      
      Add per-pool statistics counters for the allocation path of a page pool.
      These stats are incremented in softirq context, so no locking or per-cpu
      variables are needed.
      
      This code is disabled by default and a kernel config option is provided for
      users who wish to enable them.
      
      The statistics added are:
      	- fast: successful fast path allocations
      	- slow: slow path order-0 allocations
      	- slow_high_order: slow path high order allocations
      	- empty: ptr ring is empty, so a slow path allocation was forced.
      	- refill: an allocation which triggered a refill of the cache
      	- waive: pages obtained from the ptr ring that cannot be added to
      	  the cache due to a NUMA mismatch.
      
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8610037e
    • Tao Chen's avatar
      tcp: Remove the unused api · 42f0c193
      Tao Chen authored
      Last tcp_write_queue_head() use was removed in commit
      114f39fe
      
       ("tcp: restore autocorking"), so remove it.
      
      Signed-off-by: default avatarTao Chen <chentao3@hotmail.com>
      Link: https://lore.kernel.org/r/SYZP282MB33317DEE1253B37C0F57231E86029@SYZP282MB3331.AUSP282.PROD.OUTLOOK.COM
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      42f0c193
    • Kurt Kanzenbach's avatar
      flow_dissector: Add support for HSR · bf08824a
      Kurt Kanzenbach authored
      
      
      Network drivers such as igb or igc call eth_get_headlen() to determine the
      header length for their to be constructed skbs in receive path.
      
      When running HSR on top of these drivers, it results in triggering BUG_ON() in
      skb_pull(). The reason is the skb headlen is not sufficient for HSR to work
      correctly. skb_pull() notices that.
      
      For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which
      is not correct. The problem is, the flow dissection code does not take HSR into
      account. Therefore, add support for it.
      
      Reported-by: default avatarAnthony Harivel <anthony.harivel@linutronix.de>
      Signed-off-by: default avatarKurt Kanzenbach <kurt@linutronix.de>
      Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bf08824a
    • Baruch Siach's avatar
      net: dsa: mv88e6xxx: support RMII cmode · 00202885
      Baruch Siach authored
      
      
      Add support for direct RMII MAC mode. This allows hardware with CPU port
      connected in direct 100M fixed link to work properly.
      
      Signed-off-by: default avatarBaruch Siach <baruch.siach@siklu.com>
      Link: https://lore.kernel.org/r/a962d1ccbeec42daa10dd8aff0e66e31f0faf1eb.1646050203.git.baruch@tkos.co.il
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      00202885
    • Baruch Siach's avatar
      net: dsa: mv88e6xxx: don't error out cmode set on missing lane · 13b0bd2e
      Baruch Siach authored
      
      
      When the given cmode has no serdes, mv88e6xxx_serdes_get_lane() returns
      -NODEV. Earlier in the same function the code skips serdes handing in
      this case. Do the same after cmode set.
      
      Signed-off-by: default avatarBaruch Siach <baruch.siach@siklu.com>
      Link: https://lore.kernel.org/r/cd95cf3422ae8daf297a01fa9ec3931b203cdf45.1646050203.git.baruch@tkos.co.il
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13b0bd2e
    • Yang Li's avatar
      net: openvswitch: remove unneeded semicolon · cb1d8fba
      Yang Li authored
      
      
      Eliminate the following coccicheck warning:
      ./net/openvswitch/flow.c:379:2-3: Unneeded semicolon
      
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20220227132208.24658-1-yang.lee@linux.alibaba.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cb1d8fba
    • Baowen Zheng's avatar
      flow_offload: improve extack msg for user when adding invalid filter · d922a99b
      Baowen Zheng authored
      
      
      Add extack message to return exact message to user when adding invalid
      filter with conflict flags for TC action.
      
      In previous implement we just return EINVAL which is confusing for user.
      
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Link: https://lore.kernel.org/r/1646191769-17761-1-git-send-email-baowen.zheng@corigine.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d922a99b
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 2102a27e
      Jakub Kicinski authored
      
      
      Tony Nguyen says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2022-03-01
      
      This series contains updates to iavf driver only.
      
      Mateusz adds support for interrupt moderation for 50G and 100G speeds
      as well as support for the driver to specify a request as its primary
      MAC address. He also refactors VLAN V2 capability exchange into more
      generic extended capabilities to ease the addition of future
      capabilities. Finally, he corrects the incorrect return of iavf_status
      values and removes non-inclusive language.
      
      Minghao Chi removes unneeded variables, instead returning values
      directly.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
        iavf: Remove non-inclusive language
        iavf: Fix incorrect use of assigning iavf_status to int
        iavf: stop leaking iavf_status as "errno" values
        iavf: remove redundant ret variable
        iavf: Add usage of new virtchnl format to set default MAC
        iavf: refactor processing of VLAN V2 capability message
        iavf: Add support for 50G/100G in AIM algorithm
      ====================
      
      Link: https://lore.kernel.org/r/20220301185939.3005116-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2102a27e
    • Christophe JAILLET's avatar
      nfp: flower: Remove usage of the deprecated ida_simple_xxx API · 43250901
      Christophe JAILLET authored
      
      
      Use ida_alloc_xxx()/ida_free() instead to
      ida_simple_get()/ida_simple_remove().
      The latter is deprecated and more verbose.
      
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220301131212.26348-1-simon.horman@corigine.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      43250901
    • Russell King (Oracle)'s avatar
      net: sfp: use %pe for printing errors · 9ae1ef4b
      Russell King (Oracle) authored
      
      
      Convert sfp to use %pe for printing error codes, which can print them
      as errno symbols rather than numbers.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/E1nOyEN-00BuuE-OB@rmk-PC.armlinux.org.uk
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9ae1ef4b
    • Russell King (Oracle)'s avatar
      net: phylink: use %pe for printing errors · ab1198e5
      Russell King (Oracle) authored
      
      
      Convert phylink to use %pe for printing error codes, which can print
      them as errno symbols rather than numbers.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/E1nOyEI-00Buu8-K9@rmk-PC.armlinux.org.uk
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab1198e5
    • Harold Huang's avatar
      tuntap: add sanity checks about msg_controllen in sendmsg · 74a335a0
      Harold Huang authored
      In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to
      tun_sendmsg. Although we donot use msg_controllen in this path, we should
      check msg_controllen to make sure the caller pass a valid msg_ctl.
      
      [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe8dd45b
      
      
      
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Suggested-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarHarold Huang <baymaxhuang@gmail.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      74a335a0
    • Jakub Kicinski's avatar
      Merge tag 'batadv-next-pullrequest-20220302' of git://git.open-mesh.org/linux-merge · fa452e0a
      Jakub Kicinski authored
      
      
      Simon Wunderlich says:
      
      ====================
      This cleanup patchset includes the following patches:
      
       - bump version strings, by Simon Wunderlich
      
       - Remove redundant 'flush_workqueue()' calls, by Christophe JAILLET
      
       - Migrate to linux/container_of.h, by Sven Eckelmann
      
       - Demote batadv-on-batadv skip error message, by Sven Eckelmann
      
      * tag 'batadv-next-pullrequest-20220302' of git://git.open-mesh.org/linux-merge:
        batman-adv: Demote batadv-on-batadv skip error message
        batman-adv: Migrate to linux/container_of.h
        batman-adv: Remove redundant 'flush_workqueue()' calls
        batman-adv: Start new development cycle
      ====================
      
      Link: https://lore.kernel.org/r/20220302163522.102842-1-sw@simonwunderlich.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa452e0a
    • Wang Qing's avatar
      net: hamradio: fix compliation error · a577223a
      Wang Qing authored
      add missing ")" which caused by previous commit.
      
      Fixes: 61c4fb9c
      
       ("net: hamradio: use time_is_after_jiffies() instead of open coding it")
      Link: https://lore.kernel.org/all/1646018012-61129-1-git-send-email-wangqing@vivo.com/
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Link: https://lore.kernel.org/r/1646203277-83159-1-git-send-email-wangqing@vivo.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a577223a
  2. Mar 02, 2022
    • Sven Eckelmann's avatar
      batman-adv: Demote batadv-on-batadv skip error message · 6ee3c393
      Sven Eckelmann authored
      
      
      The error message "Cannot find parent device" was shown for users of
      macvtap (on batadv devices) whenever the macvtap was moved to a different
      netns. This happens because macvtap doesn't provide an implementation for
      rtnl_link_ops->get_link_net.
      
      The situation for which this message is printed is actually not an error
      but just a warning that the optional sanity check was skipped. So demote
      the message from error to warning and adjust the text to better explain
      what happened.
      
      Reported-by: default avatarLeonardo Mörlein <freifunk@irrelefant.net>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      6ee3c393
    • Sven Eckelmann's avatar
      batman-adv: Migrate to linux/container_of.h · eb7da4f1
      Sven Eckelmann authored
      The commit d2a8ebbf
      
       ("kernel.h: split out container_of() and
      typeof_member() macros")  introduced a new header for the container_of
      related macros from (previously) linux/kernel.h.
      
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      eb7da4f1
    • Jakub Kicinski's avatar
      Merge branch 'if_ether-h-add-industrial-fieldbus-ethertypes' · 96946d89
      Jakub Kicinski authored
      
      
      Daniel Braunwarth says:
      
      ====================
      if_ether.h: add industrial fieldbus Ethertypes
      
      This set of patches adds the Ethertypes for PROFINET and EtherCAT.
      
      The defines should be used by iproute2 to extend the list of available link
      layer protocols.
      ====================
      
      Link: https://lore.kernel.org/r/20220228133029.100913-1-daniel@braunwarth.dev
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      96946d89
    • Daniel Braunwarth's avatar
      if_ether.h: add EtherCAT Ethertype · cd73cda7
      Daniel Braunwarth authored
      
      
      Add the Ethertype for EtherCAT protocol.
      
      Signed-off-by: default avatarDaniel Braunwarth <daniel@braunwarth.dev>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd73cda7
    • Daniel Braunwarth's avatar
      if_ether.h: add PROFINET Ethertype · dd0ca255
      Daniel Braunwarth authored
      
      
      Add the Ethertype for PROFINET protocol.
      
      Signed-off-by: default avatarDaniel Braunwarth <daniel@braunwarth.dev>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dd0ca255
    • Sven Eckelmann's avatar
      macvtap: advertise link netns via netlink · a0219215
      Sven Eckelmann authored
      
      
      Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
      added to rtnetlink messages. This fixes iproute2 which otherwise resolved
      the link interface to an interface in the wrong namespace.
      
      Test commands:
      
        ip netns add nst
        ip link add dummy0 type dummy
        ip link add link macvtap0 link dummy0 type macvtap
        ip link set macvtap0 netns nst
        ip -netns nst link show macvtap0
      
      Before:
      
        10: macvtap0@gre0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
            link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff
      
      After:
      
        10: macvtap0@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
            link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff link-netnsid 0
      
      Reported-by: default avatarLeonardo Mörlein <freifunk@irrelefant.net>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Link: https://lore.kernel.org/r/20220228003240.1337426-1-sven@narfation.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a0219215
    • Wan Jiabing's avatar
      nfp: avoid newline at end of message in NL_SET_ERR_MSG_MOD · 323d51ca
      Wan Jiabing authored
      
      
      Fix the following coccicheck warning:
      ./drivers/net/ethernet/netronome/nfp/flower/qos_conf.c:750:7-55: WARNING
      avoid newline at end of message in NL_SET_ERR_MSG_MOD
      
      Signed-off-by: default avatarWan Jiabing <wanjiabing@vivo.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220301112356.1820985-1-wanjiabing@vivo.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      323d51ca
    • Harold Huang's avatar
      tun: support NAPI for packets received from batched XDP buffs · fb3f9037
      Harold Huang authored
      
      
      In tun, NAPI is supported and we can also use NAPI in the path of
      batched XDP buffs to accelerate packet processing. What is more, after
      we use NAPI, GRO is also supported. The iperf shows that the throughput of
      single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
      Gbps nearly reachs the line speed of the phy nic and there is still about
      15% idle cpu core remaining on the vhost thread.
      
      Test topology:
      [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
      
      Iperf stream:
      iperf3 -c 10.0.0.2  -i 1 -t 10
      
      Before:
      ...
      [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
      [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
      [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
      [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
      [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
      [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
      
      After:
      ...
      [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
      [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
      [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
      [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
      [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
      [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
      
      Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
      Signed-off-by: default avatarHarold Huang <baymaxhuang@gmail.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20220228033805.1579435-1-baymaxhuang@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb3f9037
    • Jakub Kicinski's avatar
      Merge branch 'sfc-optimize-rxqs-count-and-affinities' · 422ce836
      Jakub Kicinski authored
      
      
      Íñigo Huguet says:
      
      ====================
      sfc: optimize RXQs count and affinities
      
      In sfc driver one RX queue per physical core was allocated by default.
      Later on, IRQ affinities were set spreading the IRQs in all NUMA local
      CPUs.
      
      However, with that default configuration it result in a non very optimal
      configuration in many modern systems. Specifically, in systems with hyper
      threading and 2 NUMA nodes, affinities are set in a way that IRQs are
      handled by all logical cores of one same NUMA node. Handling IRQs from
      both hyper threading siblings has no benefit, and setting affinities to one
      queue per physical core is neither a very good idea because there is a
      performance penalty for moving data across nodes (I was able to check it
      with some XDP tests using pktgen).
      
      This patches reduce the default number of channels to one per physical
      core in the local NUMA node. Then, they set IRQ affinities to CPUs in
      the local NUMA node only. This way we save hardware resources since
      channels are limited resources. We also leave more room for XDP_TX
      channels without hitting driver's limit of 32 channels per interface.
      
      Running performance tests using iperf with a SFC9140 device showed no
      performance penalty for reducing the number of channels.
      
      RX XDP tests showed that performance can go down to less than half if
      the IRQ is handled by a CPU in a different NUMA node, which doesn't
      happen with the new defaults from this patches.
      ====================
      
      Link: https://lore.kernel.org/r/20220228132254.25787-1-ihuguet@redhat.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      422ce836