Skip to content
  1. Sep 05, 2019
  2. Sep 04, 2019
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 2c1f9e26
      David S. Miller authored
      
      
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2019-09-03
      
      This series contains updates to ice driver only.
      
      Anirudh adds the ability for the driver to handle EMP resets correctly
      by adding the logic to the existing ice_reset_subtask().
      
      Jeb fixes up the logic to properly free up the resources for a switch
      rule whether or not it was successful in the removal.
      
      Brett fixes up the reporting of ITR values to let the user know odd ITR
      values are not allowed.  Fixes the driver to only disable VLAN pruning
      on VLAN deletion when the VLAN being deleted is the last VLAN on the VF
      VSI.
      
      Chinh updates the driver to determine the TSA value from the priority
      value when in CEE mode.
      
      Bruce aligns the driver with the hardware specification by ensuring that
      a PF reset is done as part of the unload logic.  Also update the driver
      unloading field, based on the latest hardware specification, which
      allows us to remove an unnecessary endian conversion.  Moves #defines
      based on their need in the code.
      
      Jesse adds the current state of auto-negotiation in the link up message.
      In addition, adds additional information to inform the user of an issue
      with the topology/configuration of the link.
      
      Usha updates the driver to allow the maximum TCs that the firmware
      supports, rather than hard coding to a set value.
      
      Dave updates the DCB initialization flow to handle the case of an actual
      error during DCB init.  Updated the driver to report the current stats,
      even when the netdev is down, which aligns with our other drivers.
      
      Mitch fixes the VF reset code flows to ensure that it properly calls
      ice_dis_vsi_txq() to notify the firmware that the VF is being reset.
      
      Michal fixes the driver so the DCB is not enabled when the SW LLDP is
      activated, which was causing a communication issue with other NICs.  The
      problem lies in that DCB was being enabled without checking the number
      of TCs.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c1f9e26
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2019-09-01-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 94810bd3
      David S. Miller authored
      
      
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2019-09-01  (Software steering support)
      
      Abstract:
      --------
      Mellanox ConnetX devices supports packet matching, packet modification and
      redirection. These functionalities are also referred to as flow-steering.
      To configure a steering rule, the rule is written to the device owned
      memory, this memory is accessed and cached by the device when processing
      a packet.
      Steering rules are constructed from multiple steering entries (STE).
      
      Rules are configured using the Firmware command interface. The Firmware
      processes the given driver command and translates them to STEs, then
      writes them to the device memory in the current steering tables.
      This process is slow due to the architecture of the command interface and
      the processing complexity of each rule.
      
      The highlight of this patchset is to cut the middle man (The firmware) and
      do steering rules programming into device directly from the driver, with
      no firmware intervention whatsoever.
      
      Motivation:
      -----------
      Software (driver managed) steering allows for high rule insertion rates
      compared to the FW steering described above, this is achieved by using
      internal RDMA writes to the device owned memory instead of the slow
      command interface to program steering rules.
      
      Software (driver managed) steering, doesn't depend on new FW
      for new steering functionality, new implementations can be done in the
      driver skipping the FW layer.
      
      Performance:
      ------------
      The insertion rate on a single core using the new approach allows
      programming ~300K rules per sec. (Done via direct raw test to the new mlx5
      sw steering layer, without any kernel layer involved).
      
      Test: TC L2 rules
      33K/s with Software steering (this patchset).
      5K/s  with FW and current driver.
      This will improve OVS based solution performance.
      
      Architecture and implementation details:
      ----------------------------------------
      Software steering will be dynamically selected via devlink device
      parameter. Example:
      $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
                pci/0000:06:00.0:
                name flow_steering_mode type driver-specific
                values:
                   cmode runtime value smfs
      
      mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
      and contained in mlx5/core/steering directory and controlled by
      MLX5_SW_STEERING kconfig flag.
      
      mlx5 core steering layer (fs_core) already provides a shim layer for
      implementing different steering mechanisms, software steering will
      leverage that as seen at the end of this series.
      
      When Software Steering for a specific steering domain
      (NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
      targeting this domain to be created using  SW steering instead of FW.
      
      The implementation includes:
      Domain - The steering domain is the object that all other object resides
          in. It holds the memory allocator, send engine, locks and other shared
          data needed by lower objects such as table, matcher, rule, action.
          Each domain can contain multiple tables. Domain is equivalent to
          namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
          currently in mlx5_core fs_core (flow steering core).
      
      Table - Table objects are used for holding multiple matchers, each table
          has a level used to prevent processing loops. Packets are being
          directed to this table once it is set as the root table, this is done
          by fs_core using a FW command. A packet is being processed inside the
          table matcher by matcher until a successful hit, otherwise the packet
          will perform the default action.
      
      Matcher - Matchers objects are used to specify the fields mask for
          matching when processing a packet. A matcher belongs to a table, each
          matcher can hold multiple rules, each rule with different matching
          values corresponding to the matcher mask. Each matcher has a priority
          used for rule processing order inside the table.
      
      Action - Action objects are created to specify different steering actions
          such as count, reformat (encapsulate, decapsulate, ...), modify
          header, forward to table and many other actions. When creating a rule
          a sequence of actions can be provided to be executed on a successful
          match.
      
      Rule - Rule objects are used to specify a specific match on packets as
          well as the actions that should be executed. A rule belongs to a
          matcher.
      
      STE - This layer is used to hold the specific STE format for the device
          and to convert the requested rule to STEs. Each rule is constructed of
          an STE chain, Multiple rules construct a steering graph. Each node in
          the graph is a hash table containing multiple STEs. The index of each
          STE in the hash table is being calculated using a CRC32 hash function.
      
      Memory pool - Used for managing and caching device owned memory for rule
          insertion. The memory is being allocated using DM (device memory) API.
      
      Communication with device - layer for standard RDMA operation using  RC QP
          to configure the device steering.
      
      Command utility - This module holds all of the FW commands that are
          required for SW steering to function.
      
      Patch planning and files:
      -------------------------
      1) First patch, adds the support to Add flow steering actions to fs_cmd
      shim layer.
      
      2) Next 12 patch will add a file per each Software steering
      functionality/module as described above. (See patches with title: DR, *)
      
      3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
      build with the new files
      
      4) Next two patches will add the support for software steering in mlx5
      steering shim layer
      net/mlx5: Add API to set the namespace steering mode
      net/mlx5: Add direct rule fs_cmd implementation
      
      5) Last two patches will add the new devlink parameter to select mlx5
      steering mode, will be valid only for switchdev mode for now.
      Two modes are supported:
          1. DMFS - Device managed flow steering
          2. SMFS - Software/Driver managed flow steering.
      
          In the DMFS mode, the HW steering entities are created through the
          FW. In the SMFS mode this entities are created though the driver
          directly.
      
          The driver will use the devlink steering mode only if the steering
          domain supports it, for now SMFS will manages only the switchdev
          eswitch steering domain.
      
          User command examples:
          - Set SMFS flow steering mode::
      
              $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
      
          - Read device flow steering mode::
      
              $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
                pci/0000:06:00.0:
                name flow_steering_mode type driver-specific
                values:
                   cmode runtime value smfs
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94810bd3
    • Brett Creeley's avatar
      ice: Only disable VLAN pruning for the VF when all VLANs are removed · cd186e51
      Brett Creeley authored
      
      
      Currently if the VF adds a VLAN, VLAN pruning will be enabled for that VSI.
      Also, when a VLAN gets deleted it will disable VLAN pruning even if other
      VLAN(s) exists for the VF. Fix this by only disabling VLAN pruning on the
      VF VSI when removing the last VF (i.e. vf->num_vlan == 0).
      
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cd186e51
    • Michal Swiatkowski's avatar
      ice: Remove enable DCB when SW LLDP is activated · 03bba020
      Michal Swiatkowski authored
      
      
      Remove code that enables DCB in initialization when SW LLDP is
      activated. DCB flag is set or reset before in ice_init_pf_dcb
      based on number of TCs. So there is not need to overwrite it.
      
      Setting DCB without checking number of TCs can cause communication
      problems with other cards. Host card sends packet with VLAN priority
      tag, but client card doesn't strip this tag and ping doesn't work.
      
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      03bba020
    • Dave Ertman's avatar
      ice: Report stats when VSI is down · 3d57fd10
      Dave Ertman authored
      
      
      There is currently a check in get_ndo_stats that
      returns before updating stats if the VSI is down
      or there are no Tx or Rx queues.  This causes the
      netdev to report zero stats with the netdev is down.
      
      Remove the check so that the behavior of reporting
      stats is the same as it was in IXGBE.
      
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3d57fd10
    • Mitch Williams's avatar
      ice: Always notify FW of VF reset · 06914ac2
      Mitch Williams authored
      
      
      The call to ice_dis_vsi_txq() acts as the notification to the firmware
      that the VF is being reset. Because of this, we need to make this call
      every time we reset, regardless of whatever else we do to stop the Tx
      queues.
      
      Without this change, VF resets would fail to complete on interfaces that
      were up and running.
      
      Signed-off-by: default avatarMitch Williams <mitch.a.williams@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      06914ac2
    • Dave Ertman's avatar
      ice: Correctly handle return values for init DCB · 473ca574
      Dave Ertman authored
      
      
      In the init path for DCB, the call to ice_init_dcb()
      can return a non-zero value for either an actual
      error, or due to the FW lldp engine being stopped.
      
      We are currently treating all non-zero values only as
      an indication that the FW LLDP engine is stopped.
      
      Check for an actual error in the DCB init flow.
      
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      473ca574
    • Usha Ketineni's avatar
      ice: Limit Max TCs on devices with more than 4 ports · a257f188
      Usha Ketineni authored
      
      
      This patch limits the max TCs set by the driver to the value provided by
      the firmware as per the capabilities of the device. Otherwise, hard coding
      to 8 TC max would fail the device configurations with more than 4 ports.
      
      Signed-off-by: default avatarUsha Ketineni <usha.k.ketineni@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a257f188
    • Tony Nguyen's avatar
      ice: Cleanup defines in ice_type.h · 6a025730
      Tony Nguyen authored
      
      
      Conventionally, if the #defines/other are not needed by other header
      files being included, #includes are done first followed by #defines
      and other stuff. Move the #defines before the #includes to follow this
      convention.
      
      Suggested by: Bruce Allan <bruce.w.allan@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6a025730
    • Jesse Brandeburg's avatar
      ice: print extra message if topology issue · 2e0ab37c
      Jesse Brandeburg authored
      
      
      The driver needs to inform the user if there is an issue
      with the topology / configuration of the link.
      
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2e0ab37c
    • Jesse Brandeburg's avatar
      ice: add print of autoneg state to link message · 43260988
      Jesse Brandeburg authored
      
      
      Print the state of auto-negotiation when printing the Link
      up message.  Adds new text to the "NIC Link is up" line like
      Autoneg: <True | False>
      
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      43260988
    • Bruce Allan's avatar
      ice: update driver unloading field for Queue Shutdown AQ command · 7404e84a
      Bruce Allan authored
      
      
      According to recent specification versions, the field in the Queue Shutdown
      AdminQ command consisting of the "driver unloading" indication is not a 4
      byte field (it is byte.bit 16.0).  Change it to a byte and remove the
      unnecessary endian conversion.
      
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7404e84a
    • Bruce Allan's avatar
      ice: add needed PFR during driver unload · 18057cb3
      Bruce Allan authored
      
      
      According to the specification, a PF Reset must be done as part of the
      driver unload flow.
      
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      18057cb3
    • Chinh T Cao's avatar
      ice: Deduce TSA value from the priority value in the CEE mode · d24ef08a
      Chinh T Cao authored
      
      
      In CEE mode, the TSA information can be derived from the reported
      priority value.
      
      Signed-off-by: default avatarChinh T Cao <chinh.t.cao@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d24ef08a
    • Brett Creeley's avatar
      ice: Report what the user set for coalesce [tx|rx]-usecs · 567af267
      Brett Creeley authored
      
      
      Currently if the user sets an odd value for [tx|rx]-usecs we align the
      value because the hardware only understands ITR values in multiples of
      2. This seems misleading because we are essentially telling the user
      that the ITR value is odd, when in fact we have changed it internally.
      Fix this by reporting that setting odd ITR values is not allowed.
      
      Also, while making changes to ice_set_rc_coalesce() I noticed a bit of
      code/error duplication. Make the necessary changes to remove the
      duplication.
      
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      567af267
    • Jeb Cramer's avatar
      ice: Fix resource leak in ice_remove_rule_internal() · 8132e17d
      Jeb Cramer authored
      
      
      We don't free s_rule if ice_aq_sw_rules() returns a non-zero status.  If
      it returned a zero status, s_rule would be freed right after, so this
      implies it should be freed within the scope of the function regardless.
      
      Signed-off-by: default avatarJeb Cramer <jeb.j.cramer@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8132e17d
    • Anirudh Venkataramanan's avatar
      ice: Fix EMP reset handling · 03af8406
      Anirudh Venkataramanan authored
      
      
      ice_reset_subtask needs to handle EMP resets as well, as EMP resets
      can be triggered by the firmware. This patch adds the logic to do
      this.
      
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      03af8406