Skip to content
  1. Mar 15, 2022
    • Jacob Keller's avatar
      ice: remove circular header dependencies on ice.h · 649c87c6
      Jacob Keller authored
      
      
      Several headers in the ice driver include ice.h even though they are
      themselves included by that header. The most notable of these is
      ice_common.h, but several other headers also do this.
      
      Such a recursive inclusion is problematic as it forces headers to be
      included in a strict order, otherwise compilation errors can result. The
      circular inclusions do not trigger an endless loop due to standard
      header inclusion guards, however other errors can occur.
      
      For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
      ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.
      
      ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
      finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
      creates a circular dependency.
      
      The definition in ice_sriov.h requires things from ice_flow.h, but
      ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
      process for expanding ice.h. The current code avoids this issue by
      having an implicit dependency without the include of ice_flow.h.
      
      If we were to fix that so that ice_sriov.h explicitly depends on
      ice_flow.h the following pattern would occur:
      
        ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h
      
      At this point, during the expansion of, the header guard for ice_flow.h
      is already set, so when ice_sriov.h attempts to load the ice_flow.h
      header it is skipped. Then, we go on to begin including the rest of
      ice_sriov.h, including structure definitions which depend on
      ice_rss_hash_cfg. This produces a compiler warning because
      ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
      start of ice_flow.h!
      
      If the order of headers is incorrect (ice_flow.h is not implicitly
      loaded first in all files which include ice_sriov.h) then we get the
      same failure.
      
      Removing this recursive inclusion requires fixing a few cases where some
      headers depended on the header inclusions from ice.h. In addition, a few
      other changes are also required.
      
      Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
      which is the likely reason that ice_common.h includes ice.h at all. This
      macro implementation requires the full definition of ice_pf in order to
      properly compile.
      
      Fix this by moving it to a function declared in ice_main.c, so that we
      do not require all files to depend on the layout of the ice_pf
      structure.
      
      Note that this change only fixes circular dependencies, but it does not
      fully resolve all implicit dependencies where one header may depend on
      the inclusion of another. I tried to fix as many of the implicit
      dependencies as I noticed, but fixing them all requires a somewhat
      tedious analysis of each header and attempting to compile it separately.
      
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      649c87c6
    • Jacob Keller's avatar
      ice: rename ice_virtchnl_pf.c to ice_sriov.c · 0deb0bf7
      Jacob Keller authored
      
      
      The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
      code for implementing Single Root IOV virtualization resides. This code
      includes support for bringing up and tearing down VFs, hooks into the
      kernel SR-IOV netdev operations, and for handling virtchnl messages from
      VFs.
      
      In the future, we plan to support Scalable IOV in addition to Single
      Root IOV as an alternative virtualization scheme. This implementation
      will re-use some but not all of the code in ice_virtchnl_pf.c
      
      To prepare for this future, we want to refactor and split up the code in
      ice_virtchnl_pf.c into the following scheme:
      
       * ice_vf_lib.[ch]
      
         Basic VF structures and accessors. This is where scheme-independent
         code will reside.
      
       * ice_virtchnl.[ch]
      
         Virtchnl message handling. This is where the bulk of the logic for
         processing messages from VFs using the virtchnl messaging scheme will
         reside. This is separated from ice_vf_lib.c because it is distinct
         and has a bulk of the processing code.
      
       * ice_sriov.[ch]
      
         Single Root IOV implementation, including initialization and the
         routines for interacting with SR-IOV based netdev operations.
      
       * (future) ice_siov.[ch]
      
         Scalable IOV implementation.
      
      As a first step, lets assume that all of the code in
      ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
      ice_sriov.c and its header to ice_sriov.h
      
      Future changes will further split out the code in these files following
      the plan outlined here.
      
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0deb0bf7
    • Jacob Keller's avatar
      ice: rename ice_sriov.c to ice_vf_mbx.c · d775155a
      Jacob Keller authored
      
      
      The ice_sriov.c file primarily contains code which handles the logic for
      mailbox overflow detection and some other utility functions related to
      the virtualization mailbox.
      
      The bulk of the SR-IOV implementation is actually found in
      ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific.
      
      In the future, the ice driver will support an additional virtualization
      scheme known as Scalable IOV, and the code in this file will be used
      for this alternative implementation.
      
      Rename this file (and its associated header) to ice_vf_mbx.c, so that we
      can later re-use the ice_sriov.c file as the SR-IOV specific file.
      
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d775155a
    • Niklas Söderlund's avatar
      nfp: flower: avoid newline at the end of message in NL_SET_ERR_MSG_MOD · bdd6a89d
      Niklas Söderlund authored
      
      
      Fix the following coccicheck warning:
      
          drivers/net/ethernet/netronome/nfp/flower/action.c:959:7-69: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD
      
      Signed-off-by: default avatarNiklas Söderlund <niklas.soderlund@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220312095823.2425775-1-niklas.soderlund@corigine.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bdd6a89d
    • Saeed Mahameed's avatar
      net/mlx5e: Fix use-after-free in mlx5e_stats_grp_sw_update_stats · 8772cc49
      Saeed Mahameed authored
      We need to sync page pool stats only for active channels. Reading ethtool
      stats on a down netdev or a netdev with modified number of channels will
      result in a user-after-free, trying to access page pools that are freed
      already.
      
      BUG: KASAN: use-after-free in mlx5e_stats_grp_sw_update_stats+0x465/0xf80
      Read of size 8 at addr ffff888004835e40 by task ethtool/720
      
      Fixes: cc10e84b
      
       ("mlx5: add support for page_pool_get_stats")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarJoe Damato <jdamato@fastly.com>
      Link: https://lore.kernel.org/r/20220312005353.786255-1-saeed@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8772cc49
    • Julia Lawall's avatar
      net/mlx4_en: use kzalloc · 3c2dfb73
      Julia Lawall authored
      Use kzalloc instead of kmalloc + memset.
      
      The semantic patch that makes this change is:
      (https://coccinelle.gitlabpages.inria.fr/website/
      
      )
      
      //<smpl>
      @@
      expression res, size, flag;
      @@
      - res = kmalloc(size, flag);
      + res = kzalloc(size, flag);
        ...
      - memset(res, 0, size);
      //</smpl>
      
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20220312102705.71413-3-Julia.Lawall@inria.fr
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3c2dfb73
    • Eric Dumazet's avatar
      net: disable preemption in dev_core_stats_XXX_inc() helpers · fc93db15
      Eric Dumazet authored
      syzbot was kind enough to remind us that dev->{tx_dropped|rx_dropped}
      could be increased in process context.
      
      BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor413/3593
      caller is netdev_core_stats_alloc+0x98/0x110 net/core/dev.c:10298
      CPU: 1 PID: 3593 Comm: syz-executor413 Not tainted 5.17.0-rc7-syzkaller-02426-g97aeb877de7f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       check_preemption_disabled+0x16b/0x170 lib/smp_processor_id.c:49
       netdev_core_stats_alloc+0x98/0x110 net/core/dev.c:10298
       dev_core_stats include/linux/netdevice.h:3855 [inline]
       dev_core_stats_rx_dropped_inc include/linux/netdevice.h:3866 [inline]
       tun_get_user+0x3455/0x3ab0 drivers/net/tun.c:1800
       tun_chr_write_iter+0xe1/0x200 drivers/net/tun.c:2015
       call_write_iter include/linux/fs.h:2074 [inline]
       new_sync_write+0x431/0x660 fs/read_write.c:503
       vfs_write+0x7cd/0xae0 fs/read_write.c:590
       ksys_write+0x12d/0x250 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f2cf4f887e3
      Code: 5d 41 5c 41 5d 41 5e e9 9b fd ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
      RSP: 002b:00007ffd50dd5fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007ffd50dd6000 RCX: 00007f2cf4f887e3
      RDX: 000000000000002a RSI: 0000000000000000 RDI: 00000000000000c8
      RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffd50dd5ff0 R14: 00007ffd50dd5fe8 R15: 00007ffd50dd5fe4
       </TASK>
      
      Fixes: 625788b5
      
       ("net: add per-cpu storage and net->core_stats")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: jeffreyji <jeffreyji@google.com>
      Cc: Brian Vazquez <brianvv@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20220312214505.3294762-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fc93db15
    • Julia Lawall's avatar
      drivers: net: packetengines: fix typos in comments · ebc0b8b5
      Julia Lawall authored
      
      
      Various spelling mistakes in comments.
      Detected with the help of Coccinelle.
      
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Link: https://lore.kernel.org/r/20220314115354.144023-13-Julia.Lawall@inria.fr
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ebc0b8b5
  2. Mar 14, 2022
    • David S. Miller's avatar
      Merge branch 'dpaa2-mac-protocol-change' · 5e7350e8
      David S. Miller authored
      
      
      Ioana Ciornei says:
      
      ====================
      dpaa2-mac: add support for changing the protocol at runtime
      
      This patch set adds support for changing the Ethernet protocol at
      runtime on Layerscape SoCs which have the Lynx 28G SerDes block.
      
      The first two patches add a new generic PHY driver for the Lynx 28G and
      the bindings file associated. The driver reads the PLL configuration at
      probe time (the frequency provided to the lanes) and determines what
      protocols can be supported.
      Based on this the driver can deny or approve a request from the
      dpaa2-mac to setup a new protocol.
      
      The next 2 patches add some MC APIs for inquiring what is the running
      version of firmware and setting up a new protocol on the MAC.
      
      Moving along, we extract the code for setting up the supported
      interfaces on a MAC on a different function since in the next patches
      will update the logic.
      
      In the next patch, the dpaa2-mac is updated so that it retrieves the
      SerDes PHY based on the OF node and in case of a major reconfig, call
      the PHY driver to set up the new protocol on the associated lane and the
      MC firmware to reconfigure the MAC side of things.
      
      Finally, the LX2160A dtsi is annotated with the SerDes PHY nodes for the
      1st SerDes block. Beside this, the LX2160A Clearfog dtsi is annotated
      with the 'phys' property for the exposed SFP cages.
      
      Changes in v2:
      	- 1/8: add MODULE_LICENSE
      Changes in v3:
      	- 2/8: fix 'make dt_binding_check' errors
      	- 7/8: reverse order of dpaa2_mac_start() and phylink_start()
      	- 7/8: treat all RGMII variants in dpmac_eth_if_mode
      	- 7/8: remove the .mac_prepare callback
      	- 7/8: ignore PHY_INTERFACE_MODE_NA in validate
      Changes in v4:
      	- 1/8: remove the DT nodes parsing
      	- 1/8: add an xlate function
      	- 2/8: remove the children phy nodes for each lane
      	- 7/8: rework the of_phy_get if statement
      	- 8/8: remove the DT nodes for each lane and the lane id in the
      	  phys phandle
      Changes in v5:
      	- 2/8: use phy as the name of the DT node in the example
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e7350e8
    • Ioana Ciornei's avatar
      arch: arm64: dts: lx2160a: describe the SerDes block #1 · 3cbe93a1
      Ioana Ciornei authored
      
      
      Describe the SerDes block #1 using the generic phys infrastructure. This
      way, the ethernet nodes can each reference their serdes lanes
      individually using the 'phys' dts property.
      
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cbe93a1
    • Ioana Ciornei's avatar
      dpaa2-mac: configure the SerDes phy on a protocol change · f978fe85
      Ioana Ciornei authored
      
      
      This patch integrates the dpaa2-eth driver with the generic PHY
      infrastructure in order to search, find and reconfigure the SerDes lanes
      in case of a protocol change.
      
      On the .mac_config() callback, the phy_set_mode_ext() API is called so
      that the Lynx 28G SerDes PHY driver can change the lane's configuration.
      In the same phylink callback the MC firmware is called so that it
      reconfigures the MAC side to run using the new protocol.
      
      The consumer drivers - dpaa2-eth and dpaa2-switch - are updated to call
      the dpaa2_mac_start/stop functions newly added which will
      power_on/power_off the associated SerDes lane.
      
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f978fe85
    • Ioana Ciornei's avatar
      dpaa2-mac: move setting up supported_interfaces into a function · aa95c371
      Ioana Ciornei authored
      
      
      The logic to setup the supported interfaces will get annotated based on
      what the configuration of the SerDes PLLs supports. Move the current
      setup into a separate function just to try to keep it clean.
      
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa95c371
    • Ioana Ciornei's avatar
      dpaa2-mac: retrieve API version and detect features · dff95381
      Ioana Ciornei authored
      
      
      Retrieve the API version running on the firmware and based on it detect
      which features are available for usage.
      The first one to be listed is the capability to change the MAC protocol
      at runtime.
      
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dff95381
    • Ioana Ciornei's avatar
      dpaa2-mac: add the MC API for reconfiguring the protocol · 332b9ea5
      Ioana Ciornei authored
      
      
      The MC firmware gained recently a new command which can reconfigure the
      running protocol on the underlying MAC. Add this new command which will
      be used in the next patches in order to do a major reconfig on the
      interface.
      
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      332b9ea5
    • Ioana Ciornei's avatar
      dpaa2-mac: add the MC API for retrieving the version · 38d28b02
      Ioana Ciornei authored
      
      
      The dpmac_get_api_version command will be used in the next patches to
      determine if the current firmware is capable or not to change the
      Ethernet protocol running on the MAC.
      
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38d28b02
    • Ioana Ciornei's avatar
      dt-bindings: phy: add bindings for Lynx 28G PHY · c553f22e
      Ioana Ciornei authored
      
      
      Add device tree binding for the Lynx 28G SerDes PHY driver used on
      Layerscape based SoCs.
      
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c553f22e
    • Ioana Ciornei's avatar
      phy: add support for the Layerscape SerDes 28G · 8f73b37c
      Ioana Ciornei authored
      
      
      This patch adds a new generic PHY driver to support the Lynx 28G SerDes
      block found on some of the Layerscape SoCs such as LX2160A.
      At the moment, only the following Ethernet protocols are supported:
      SGMII/1000Base-X and 10GBaseR.
      
      SerDes lanes which are not running an Ethernet protocol or a currently
      supported Ethenet protocol will be left as it was configured through the
      RCW (Reset Configuration Word) at boot time.
      
      At probe time, the platform driver will read the current
      configuration of both PLLs found on a SerDes block and will determine
      what protocols are supported using that PLL.
      
      For example, if a PLL is configured to generate a clock net (frate) of
      5GHz the only protocols sustained by that PLL are SGMII/1000Base-X
      (using a quarter of the full clock rate) and QSGMII using the full clock
      net frequency on the lane.
      
      On the .set_mode() callback, the PHY driver will first check if the
      requested operating mode (protocol) is even supported by the current PLL
      configuration and will error out if not.
      Then, the lane is reconfigured to run on the requested protocol.
      
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f73b37c
    • David S. Miller's avatar
      Merge branch 'dsa-felix-qos' · 92ebb236
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Basic QoS classification on Felix DSA switch using dcbnl
      
      Basic QoS classification for Ocelot switches means port-based default
      priority, DSCP-based and VLAN PCP based. This is opposed to advanced QoS
      classification which is done through the VCAP IS1 TCAM based engine.
      
      The patch set is a logical continuation of this RFC which attempted to
      describe the default-prio as a matchall entry placed at the end of a
      series of offloaded tc filters:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210113154139.1803705-1-olteanv@gmail.com/
      
      
      
      I have tried my best to satisfy the feedback that we should cater for
      pre-configured QoS profiles. Ironically, the only pre-configured QoS
      profile that the Felix switch driver has is for VLAN PCP (1:1 mapping
      with QoS class), yet IEEE 802.1Q or dcbnl offer no mechanism for
      reporting or changing that.
      
      Testing was done with the iproute2 dcb app. The qos_class of packets was
      dumped from net/dsa/tag_ocelot.c.
      
      (1) $ dcb app show dev swp3
      default-prio 0
      (2) $ dcb app replace dev swp3 default-prio 3
      (3) $ dcb app replace dev swp3 dscp-prio CS3:5
      (4) $ dcb app replace dev swp3 dscp-prio CS2:2
      (5) $ dcb app show dev swp3
      default-prio 3
      dscp-prio CS2:2 CS3:5
      
      Traffic sent with "ping -Q 64 <ipaddr>", which means CS2.
      These packets match qos_class 0 after command (1),
      qos_class 3 after command (2),
      qos_class 3 after command (3), and
      qos_class 2 after command (2).
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92ebb236
    • Vladimir Oltean's avatar
      net: dsa: felix: configure default-prio and dscp priorities · 978777d0
      Vladimir Oltean authored
      
      
      Follow the established programming model for this driver and provide
      shims in the felix DSA driver which call the implementations from the
      ocelot switch lib. The ocelot switchdev driver wasn't integrated with
      dcbnl due to lack of hardware availability.
      
      The switch doesn't have any fancy QoS classification enabled by default.
      The provided getters will create a default-prio app table entry of 0,
      and no dscp entry. However, the getters have been made to actually
      retrieve the hardware configuration rather than static values, to be
      future proof in case DSA will need this information from more call paths.
      
      For default-prio, there is a single field per port, in ANA_PORT_QOS_CFG,
      called QOS_DEFAULT_VAL.
      
      DSCP classification is enabled per-port, again via ANA_PORT_QOS_CFG
      (field QOS_DSCP_ENA), and individual DSCP values are configured as
      trusted or not through register ANA_DSCP_CFG (replicated 64 times).
      An untrusted DSCP value falls back to other QoS classification methods.
      If trusted, the selected ANA_DSCP_CFG register also holds the QoS class
      in the QOS_DSCP_VAL field.
      
      The hardware also supports DSCP remapping (DSCP value X is translated to
      DSCP value Y before the QoS class is determined based on the app table
      entry for Y) and DSCP packet rewriting. The dcbnl framework, for being
      so flexible in other useless areas, doesn't appear to support this.
      So this functionality has been left out.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      978777d0
    • Vladimir Oltean's avatar
      net: dsa: report and change port dscp priority using dcbnl · 47d75f78
      Vladimir Oltean authored
      
      
      Similar to the port-based default priority, IEEE 802.1Q-2018 allows the
      Application Priority Table to define QoS classes (0 to 7) per IP DSCP
      value (0 to 63).
      
      In the absence of an app table entry for a packet with DSCP value X,
      QoS classification for that packet falls back to other methods (VLAN PCP
      or port-based default). The presence of an app table for DSCP value X
      with priority Y makes the hardware classify the packet to QoS class Y.
      
      As opposed to the default-prio where DSA exposes only a "set" in
      dsa_switch_ops (because the port-based default is the fallback, it
      always exists, either implicitly or explicitly), for DSCP priorities we
      expose an "add" and a "del". The addition of a DSCP entry means trusting
      that DSCP priority, the deletion means ignoring it.
      
      Drivers that already trust (at least some) DSCP values can describe
      their configuration in dsa_switch_ops :: port_get_dscp_prio(), which is
      called for each DSCP value from 0 to 63.
      
      Again, there can be more than one dcbnl app table entry for the same
      DSCP value, DSA chooses the one with the largest configured priority.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47d75f78
    • Vladimir Oltean's avatar
      net: dsa: report and change port default priority using dcbnl · d538eca8
      Vladimir Oltean authored
      The port-based default QoS class is assigned to packets that lack a
      VLAN PCP (or the port is configured to not trust the VLAN PCP),
      an IP DSCP (or the port is configured to not trust IP DSCP), and packets
      on which no tc-skbedit action has matched.
      
      Similar to other drivers, this can be exposed to user space using the
      DCB Application Priority Table. IEEE 802.1Q-2018 specifies in Table
      D-8 - Sel field values that when the Selector is 1, the Protocol ID
      value of 0 denotes the "Default application priority. For use when
      application priority is not otherwise specified."
      
      The way in which the dcbnl integration in DSA has been designed has to
      do with its requirements. Andrew Lunn explains that SOHO switches are
      expected to come with some sort of pre-configured QoS profile, and that
      it is desirable for this to come pre-loaded into the DSA slave interfaces'
      DCB application priority table.
      
      In the dcbnl design, this is possible because calls to dcb_ieee_setapp()
      can be initiated by anyone including being self-initiated by this device
      driver.
      
      However, what makes this challenging to implement in DSA is that the DSA
      core manages the net_devices (effectively hiding them from drivers),
      while drivers manage the hardware. The DSA core has no knowledge of what
      individual drivers' QoS policies are. DSA could export to drivers a
      wrapper over dcb_ieee_setapp() and these could call that function to
      pre-populate the app priority table, however drivers don't have a good
      moment in time to do this. The dsa_switch_ops :: setup() method gets
      called before the net_devices are created (dsa_slave_create), and so is
      dsa_switch_ops :: port_setup(). What remains is dsa_switch_ops ::
      port_enable(), but this gets called upon each ndo_open. If we add app
      table entries on every open, we'd need to remove them on close, to avoid
      duplicate entry errors. But if we delete app priority entries on close,
      what we delete may not be the initial, driver pre-populated entries, but
      rather user-added entries.
      
      So it is clear that letting drivers choose the timing of the
      dcb_ieee_setapp() call is inappropriate. The alternative which was
      chosen is to introduce hardware-specific ops in dsa_switch_ops, and
      effectively hide dcbnl details from drivers as well. For pre-populating
      the application table, dsa_slave_dcbnl_init() will call
      ds->ops->port_get_default_prio() which is supposed to read from
      hardware. If the operation succeeds, DSA creates a default-prio app
      table entry. The method is called as soon as the slave_dev is
      registered, but before we release the rtnl_mutex. This is done such that
      user space sees the app table entries as soon as it sees the interface
      being registered.
      
      The fact that we populate slave_dev->dcbnl_ops with a non-NULL pointer
      changes behavior in dcb_doit() from net/dcb/dcbnl.c, which used to
      return -EOPNOTSUPP for any dcbnl operation where netdev->dcbnl_ops is
      NULL. Because there are still dcbnl-unaware DSA drivers even if they
      have dcbnl_ops populated, the way to restore the behavior is to make all
      dcbnl_ops return -EOPNOTSUPP on absence of the hardware-specific
      dsa_switch_ops method.
      
      The dcbnl framework absurdly allows there to be more than one app table
      entry for the same selector and protocol (in other words, more than one
      port-based default priority). In the iproute2 dcb program, there is a
      "replace" syntactical sugar command which performs an "add" and a "del"
      to hide this away. But we choose the largest configured priority when we
      call ds->ops->port_set_default_prio(), using __fls(). When there is no
      default-prio app table entry left, the port-default priority is restored
      to 0.
      
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210113154139.1803705-2-olteanv@gmail.com/
      
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d538eca8
    • Victor Nogueira's avatar
      selftests: tc-testing: Increase timeout in tdc config file · 102e4a8e
      Victor Nogueira authored
      
      
      Some tests, such as Test d052: Add 1M filters with the same action, may
      not work with a small timeout value.
      
      Increase timeout to 24 seconds.
      
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      102e4a8e
    • Sebastian Andrzej Siewior's avatar
      net: Add lockdep asserts to ____napi_schedule(). · fbd9a2ce
      Sebastian Andrzej Siewior authored
      
      
      ____napi_schedule() needs to be invoked with disabled interrupts due to
      __raise_softirq_irqoff (in order not to corrupt the per-CPU list).
      ____napi_schedule() needs also to be invoked from an interrupt context
      so that the raised-softirq is processed while the interrupt context is
      left.
      
      Add lockdep asserts for both conditions.
      While this is the second time the irq/softirq check is needed, provide a
      generic lockdep_assert_softirq_will_run() which is used by both caller.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbd9a2ce
    • David S. Miller's avatar
      Merge branch 'macvlan-uaf' · d96657dc
      David S. Miller authored
      
      
      Ziyang Xuan says:
      
      ====================
      net: macvlan: fix potential UAF problem for lowerdev
      
      Add the reference operation to lowerdev of macvlan to avoid
      the potential UAF problem under the following known scenario:
      
      Someone module puts the NETDEV_UNREGISTER event handler to a
      work, and lowerdev is accessed in the work handler. But when
      the work is excuted, lowerdev has been destroyed because upper
      macvlan did not get reference to lowerdev correctly.
      
      In addition, add net device refcount tracker to macvlan.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d96657dc
    • Ziyang Xuan's avatar
      net: macvlan: add net device refcount tracker · 1f4a5983
      Ziyang Xuan authored
      
      
      Add net device refcount tracker to macvlan.
      
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f4a5983
    • Ziyang Xuan's avatar
      net: macvlan: fix potential UAF problem for lowerdev · 291ac684
      Ziyang Xuan authored
      Add the reference operation to lowerdev of macvlan to avoid
      the potential UAF problem under the following known scenario:
      
      Someone module puts the NETDEV_UNREGISTER event handler to a
      work, and lowerdev is accessed in the work handler. But when
      the work is excuted, lowerdev has been destroyed because upper
      macvlan did not get reference to lowerdev correctly.
      
      That likes as the scenario occurred by
      commit 563bcbae
      
       ("net: vlan: fix a UAF in vlan_dev_real_dev()").
      
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      291ac684
  3. Mar 13, 2022