Skip to content
  1. Dec 23, 2021
  2. Dec 16, 2021
    • Eric Dumazet's avatar
      netfilter: nf_nat_masquerade: add netns refcount tracker to masq_dev_work · fc0d026a
      Eric Dumazet authored
      
      
      If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y,
      using put_net_track() in iterate_cleanup_work()
      and netns_tracker_alloc() in nf_nat_masq_schedule()
      might help us finding netns refcount imbalances.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fc0d026a
    • Eric Dumazet's avatar
      netfilter: nfnetlink: add netns refcount tracker to struct nfulnl_instance · a9382d93
      Eric Dumazet authored
      
      
      If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y,
      using put_net_track() in nfulnl_instance_free_rcu()
      and get_net_track() in instance_create()
      might help us finding netns refcount imbalances.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a9382d93
    • Volodymyr Mytnyk's avatar
      net: prestera: flower template support · 604ba230
      Volodymyr Mytnyk authored
      
      
      Add user template explicit support. At this moment, max
      TCAM rule size is utilized for all rules, doesn't matter
      which and how much flower matches are provided by user. It
      means that some of TCAM space is wasted, which impacts
      the number of filters that can be offloaded.
      
      Introducing the template, allows to have more HW offloaded
      filters by specifying the template explicitly.
      
      Example:
        tc qd add dev PORT clsact
        tc chain add dev PORT ingress protocol ip \
          flower dst_ip 0.0.0.0/16
        tc filter add dev PORT ingress protocol ip \
          flower skip_sw dst_ip 1.2.3.4/16 action drop
      
      NOTE: chain 0 is the default chain id for "tc chain" & "tc filter"
            command, so it is omitted in the example above.
      
      This patch adds only template support for default chain 0 suppoerted
      by prestera driver at this moment. Chains are not supported yet,
      and will be added later.
      
      Signed-off-by: default avatarVolodymyr Mytnyk <vmytnyk@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      604ba230
    • Luiz Angelo Daros de Luca's avatar
      net: dsa: rtl8365mb: add GMII as user port mode · a5dba0f2
      Luiz Angelo Daros de Luca authored
      Recent net-next fails to initialize ports with:
      
       realtek-smi switch: phy mode gmii is unsupported on port 0
       realtek-smi switch lan5 (uninitialized): validation of gmii with
       support 0000000,00000000,000062ef and advertisement
       0000000,00000000,000062ef failed: -22
       realtek-smi switch lan5 (uninitialized): failed to connect to PHY:
       -EINVAL
       realtek-smi switch lan5 (uninitialized): error -22 setting up PHY
       for tree 1, switch 0, port 0
      
      Current net branch(3dd7d40b
      
      ) is not
      affected.
      
      I also noticed the same issue before with older versions but using
      a MDIO interface driver, not realtek-smi.
      
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5dba0f2
    • David S. Miller's avatar
      Merge branch 'gve-improvements' · e85fbf53
      David S. Miller authored
      
      
      Jeroen de Borst says:
      
      ====================
      gve improvements
      
      This patchset consists of unrelated changes:
      
      A bug fix for an issue that disabled jumbo-frame support, a few code
      improvements and minor funcitonal changes and 3 new features:
        Supporting tx|rx-coalesce-usec for DQO
        Suspend/resume/shutdown
        Optional metadata descriptors
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e85fbf53
    • Tao Liu's avatar
      gve: Add tx|rx-coalesce-usec for DQO · 6081ac20
      Tao Liu authored
      
      
      Adding ethtool support for changing rx-coalesce-usec and tx-coalesce-usec
      when using the DQO queue format.
      
      Signed-off-by: default avatarTao Liu <xliutaox@google.com>
      Signed-off-by: default avatarJeroen de Borst <jeroendb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6081ac20
    • Jordan Kim's avatar
      gve: Add consumed counts to ethtool stats · 2c919835
      Jordan Kim authored
      
      
      Being able to see how many descriptors are in-use is helpful
      when diagnosing certain issues.
      
      Signed-off-by: default avatarJeroen de Borst <jeroendb@google.com>
      Signed-off-by: default avatarJordan Kim <jrkim@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c919835
    • Catherine Sullivan's avatar
      gve: Implement suspend/resume/shutdown · 974365e5
      Catherine Sullivan authored
      
      
      Add support for suspend, resume and shutdown.
      
      Signed-off-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarDavid Awogbemila <awogbemila@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      974365e5
    • Willem de Bruijn's avatar
      gve: Add optional metadata descriptor type GVE_TXD_MTD · 497dbb2b
      Willem de Bruijn authored
      Allow drivers to pass metadata along with packet data to the device.
      Introduce a new metadata descriptor type
      
      * GVE_TXD_MTD
      
      This descriptor is optional. If present it immediate follows the
      packet descriptor and precedes the segment descriptor.
      
      This descriptor may be repeated. Multiple metadata descriptors may
      follow. There are no immediate uses for this, this is for future
      proofing. At present devices allow only 1 MTD descriptor.
      
      The lower four bits of the type_flags field encode GVE_TXD_MTD.
      The upper four bits of the type_flags field encodes a *sub*type.
      
      Introduce one such metadata descriptor subtype
      
      * GVE_MTD_SUBTYPE_PATH
      
      This shares path information with the device for network failure
      discovery and robust response:
      
      Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
      and updates this field on error with sk_rethink_txhash. Allow the host
      stack to do the same. Pass the tx_hash value if set. Also communicate
      whether the path hash is set, or more exactly, what its type is. Define
      two common types
      
        GVE_MTD_PATH_HASH_NONE
        GVE_MTD_PATH_HASH_L4
      
      Concrete examples of error conditions that are resolved are
      mentioned in the commits that add sk_rethink_txhash calls. Such as
      commit 7788174e
      
       ("tcp: change IPv6 flow-label upon receiving
      spurious retransmission").
      
      Experimental results mirror what the theory suggests: where IPv6
      FlowLabel is included in path selection (e.g., LAG/ECMP), flowlabel
      rotation on TCP timeout avoids the vast majority of TCP disconnects
      that would otherwise have occurred during link failures in long-haul
      backbones, when an alternative path is available.
      
      Rotation can be applied to various bad connection signals, such as
      timeouts and spurious retransmissions. In aggregate, such flow level
      signals can help locate network issues. Define initial common states:
      
        GVE_MTD_PATH_STATE_DEFAULT
        GVE_MTD_PATH_STATE_TIMEOUT
        GVE_MTD_PATH_STATE_CONGESTION
        GVE_MTD_PATH_STATE_RETRANSMIT
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid Awogbemila <awogbemila@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      497dbb2b
    • Catherine Sullivan's avatar
      gve: remove memory barrier around seqno · 5fd07df4
      Catherine Sullivan authored
      
      
      No longer needed after we introduced the barrier in gve_napi_poll.
      
      Signed-off-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fd07df4
    • Catherine Sullivan's avatar
      gve: Update gve_free_queue_page_list signature · 13e7939c
      Catherine Sullivan authored
      
      
      The id field should be a u32 not a signed int.
      
      Signed-off-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13e7939c
    • Catherine Sullivan's avatar
      gve: Move the irq db indexes out of the ntfy block struct · d30baacc
      Catherine Sullivan authored
      
      
      Giving the device access to other kernel structs is not ideal.
      Move the indexes into their own array and just keep pointers to
      them in the ntfy block struct.
      
      Signed-off-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarDavid Awogbemila <awogbemila@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d30baacc
    • Jeroen de Borst's avatar
      gve: Correct order of processing device options · a10834a3
      Jeroen de Borst authored
      The legacy raw addressing device option was processed before the
      new RDA queue format option.  This caused the supported features mask,
      which is provided only on the RDA queue format option, not to be set.
      
      This disabled jumbo-frame support when using raw adressing.
      
      Fixes: 255489f5
      
       ("gve: Add a jumbo-frame device option")
      Signed-off-by: default avatarJeroen de Borst <jeroendb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a10834a3
    • David S. Miller's avatar
      Merge branch 'phylink-pcs-validation' · 75df1a24
      David S. Miller authored
      
      
      Russell King says:
      
      ====================
      net: phylink: add PCS validation
      
      This series allows phylink to include the PCS in its validation step.
      There are two reasons to make this change:
      
      1. Some of the network drivers that are making use of the split PCS
         support are already manually calling into their PCS drivers to
         perform validation. E.g. stmmac with xpcs.
      
      2. Logically, some network drivers such as mvneta and mvpp2, the
         restriction we impose in the validate() callback is a property of
         the "PCS" block that we provide rather than the MAC.
      
      This series:
      
      1. Gives phylink a mechanism to query the MAC driver which PCS is
         wishes to use for the PHY interface mode. This is necessary to allow
         the PCS to be involved in the validation step without making changes
         to the configuration.
      
      2. Provide a pcs_validate() method that PCS can implement. This follows
         a similar model to the MAC's validate() callback, but with some minor
         differences due to observations from the various implementations.
         E.g. returning an error code for not-supported and the way the
         advertising bitmap is masked.
      
      3. Convert mvpp2 and mvneta to this as examples of its use. Further
         Conversions are in the pipeline, including for stmmac+xpcs, as well
         as some DSA drivers. Note that DSA conversion to this is conditional
         upon all DSA drivers populating their supported_interfaces bitmap,
         since this is required before mac_select_pcs() can be used.
      
      Existing drivers that set a PCS in mac_prepare() or mac_config(), or
      shortly after phylink_create() will continue to work. However, it should
      be noted that mac_select_pcs() will be called during phylink_create(),
      and thus any PCS returned by mac_select_pcs() must be available by this
      time - or we drop the check in phylink_create().
      
      v2: fix kerneldoc typo in patch 1.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75df1a24
    • Russell King (Oracle)'s avatar
      net: mvneta: convert to pcs_validate() and phylink_generic_validate() · d8c36693
      Russell King (Oracle) authored
      
      
      Convert mvneta to validate the autoneg state for 1000base-X in the
      pcs_validate() operation, rather than the MAC validate() operation.
      This allows us to switch the MAC validate() to use
      phylink_generic_validate().
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8c36693
    • Russell King's avatar
      net: mvneta: convert to phylink pcs operations · c2e7d2df
      Russell King authored
      
      
      An initial stab at converting mvneta to PCS operations.  There's a few
      FIXMEs to be solved.
      
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2e7d2df
    • Russell King's avatar
      net: mvneta: convert to use mac_prepare()/mac_finish() · 5a7d8953
      Russell King authored
      
      
      Convert mvneta to use the mac_prepare() and mac_finish() methods in
      preparation to converting mvneta to split-PCS support.
      
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a7d8953
    • Russell King (Oracle)'s avatar
      net: mvpp2: convert to pcs_validate() and phylink_generic_validate() · 85e3e0eb
      Russell King (Oracle) authored
      
      
      Convert mvpp2 to validate the autoneg state for 1000base-X in the
      pcs_validate() operation, rather than the MAC validate() operation.
      This allows us to switch the MAC validate() to use
      phylink_generic_validate().
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85e3e0eb
    • Russell King (Oracle)'s avatar
      net: mvpp2: use .mac_select_pcs() interface · cff05632
      Russell King (Oracle) authored
      
      
      Use the mac_select_pcs() method to choose between the GMAC and XLG
      PCS implementations.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cff05632
    • Russell King (Oracle)'s avatar
      net: phylink: add pcs_validate() method · 0d22d4b6
      Russell King (Oracle) authored
      
      
      Add a hook for PCS to validate the link parameters. This avoids MAC
      drivers having to have knowledge of their PCS in their validate()
      method, thereby allowing several MAC drivers to be simplfied.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d22d4b6
    • Russell King (Oracle)'s avatar
      net: phylink: add mac_select_pcs() method to phylink_mac_ops · d1e86325
      Russell King (Oracle) authored
      
      
      mac_select_pcs() allows us to have an explicit point to query which
      PCS the MAC wishes to use for a particular PHY interface mode, thereby
      allowing us to add support to validate the link settings with the PCS.
      
      Phylink will also use this to select the PCS to be used during a major
      configuration event without the MAC driver needing to call
      phylink_set_pcs().
      
      Note that if mac_select_pcs() is present, the supported_interfaces
      bitmap must be filled in; this avoids mac_select_pcs() being called
      with PHY_INTERFACE_MODE_NA when we want to get support for all
      interface types. Phylink will return an error in phylink_create()
      unless this condition is satisfied.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1e86325
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex · 4134c846
      David S. Miller authored
      
      t-queue
      
      Tony Nguyen says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2021-12-15
      
      This series contains updates to ice driver only.
      
      Jake makes changes to flash update. This includes the following:
      
       * a new shadow-ram region similar to NVM region but for the device shadow
         RAM contents. This is distinct from NVM region because shadow RAM is
         built up during device init and may be different from the raw NVM flash
         data.
       * refactoring of the ice_flash_pldm_image to become the main flash update
         entry point. This is simpler than having both an
         ice_devlink_flash_update and an ice_flash_pldm_image. It will make
         additions like dry-run easier in the future.
       * reducing time to read Option ROM version information.
       * adding support for firmware activation via devlink reload, when
         possible.
      
      The major new work is the reload support, which allows activating firmware
      immediately without a reboot when possible. Reload support only supports
      firmware activation.
      
      Jesse improves transmit code: utilizing newer netif_tx* API, adding some
      prefetch calls, correcting expected conditions when calling ice_vsi_down(),
      and utilizing __netdev_tx_sent_queue() call.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4134c846
    • David S. Miller's avatar
      Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 823f7a54
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-next branch 2021-12-15
      
      Hi Dave, Jakub, Jason
      
      This pulls mlx5-next branch into net-next and rdma branches.
      All patches already reviewed on both rdma and netdev mailing lists.
      
      Please pull and let me know if there's any problem.
      
      1) Add multiple FDB steering priorities [1]
      2) Introduce HW bits needed to configure MAC list size of VF/SF.
         Required for ("net/mlx5: Memory optimizations") upcoming series [2].
      
      [1] https://lore.kernel.org/netdev/20211201193621.9129-1-saeed@kernel.org/
      [2] https://lore.kernel.org/lkml/20211208141722.13646-1-shayd@nvidia.com/
      
      
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      823f7a54
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · bd1d97d8
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next, mostly
      rather small housekeeping patches:
      
      1) Remove unused variable in IPVS, from GuoYong Zheng.
      
      2) Use memset_after in conntrack, from Kees Cook.
      
      3) Remove leftover function in nfnetlink_queue, from Florian Westphal.
      
      4) Remove redundant test on bool in conntrack, from Bernard Zhao.
      
      5) egress support for nft_fwd, from Lukas Wunner.
      
      6) Make pppoe work for br_netfilter, from Florian Westphal.
      
      7) Remove unused variable in conntrack resize routine, from luo penghao.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
        netfilter: conntrack: Remove useless assignment statements
        netfilter: bridge: add support for pppoe filtering
        netfilter: nft_fwd_netdev: Support egress hook
        netfilter: ctnetlink: remove useless type conversion to bool
        netfilter: nf_queue: remove leftover synchronize_rcu
        netfilter: conntrack: Use memset_startat() to zero struct nf_conn
        ipvs: remove unused variable for ip_vs_new_dest
      ====================
      
      Link: https://lore.kernel.org/r/20211215234911.170741-1-pablo@netfilter.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bd1d97d8
    • luo penghao's avatar
      netfilter: conntrack: Remove useless assignment statements · 284ca764
      luo penghao authored
      
      
      The old_size assignment here will not be used anymore
      
      The clang_analyzer complains as follows:
      
      Value stored to 'old_size' is never read
      
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarluo penghao <luo.penghao@zte.com.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      284ca764
    • Shay Drory's avatar
      net/mlx5: Introduce log_max_current_uc_list_wr_supported bit · 685b1afd
      Shay Drory authored
      
      
      Downstream patch will use this bit in order to know whether the device
      supports changing of max_uc_list.
      
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      685b1afd
    • Jesse Brandeburg's avatar
      ice: use modern kernel API for kick · 9c99d099
      Jesse Brandeburg authored
      
      
      The kernel gained a new interface for drivers to use to combine tail
      bump (doorbell) and BQL updates, attempt to use those new interfaces.
      
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9c99d099
    • Jesse Brandeburg's avatar
      ice: tighter control over VSI_DOWN state · 21c6e36b
      Jesse Brandeburg authored
      
      
      The driver had comments to the effect of: This flag should be set before
      calling this function. While reviewing code it was found that there were
      several violations of this policy, which could introduce hard to find
      bugs or races.
      
      Fix the violations of the "VSI DOWN state must be set before calling
      ice_down" and make checking the state into code with a WARN_ON.
      
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      21c6e36b
    • Jesse Brandeburg's avatar
      ice: use prefetch methods · cc14db11
      Jesse Brandeburg authored
      
      
      The kernel provides some prefetch mechanisms to speed up commonly
      cold cache line accesses during receive processing. Since these are
      software structures it helps to have these strategically placed
      prefetches.
      
      Be careful to call BQL prefetch complete only for non XDP queues.
      
      Co-developed-by: default avatarPiotr Raczynski <piotr.raczynski@intel.com>
      Signed-off-by: default avatarPiotr Raczynski <piotr.raczynski@intel.com>
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      cc14db11
    • Jesse Brandeburg's avatar
      ice: update to newer kernel API · 1c96c168
      Jesse Brandeburg authored
      
      
      Use the netif_tx_* API from netdevice.h which has simpler parameters.
      
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1c96c168
    • Jacob Keller's avatar
      ice: support immediate firmware activation via devlink reload · 399e27db
      Jacob Keller authored
      
      
      The ice hardware contains an embedded chip with firmware which can be
      updated using devlink flash. The firmware which runs on this chip is
      referred to as the Embedded Management Processor firmware (EMP
      firmware).
      
      Activating the new firmware image currently requires that the system be
      rebooted. This is not ideal as rebooting the system can cause unwanted
      downtime.
      
      In practical terms, activating the firmware does not always require a
      full system reboot. In many cases it is possible to activate the EMP
      firmware immediately. There are a couple of different scenarios to
      cover.
      
       * The EMP firmware itself can be reloaded by issuing a special update
         to the device called an Embedded Management Processor reset (EMP
         reset). This reset causes the device to reset and reload the EMP
         firmware.
      
       * PCI configuration changes are only reloaded after a cold PCIe reset.
         Unfortunately there is no generic way to trigger this for a PCIe
         device without a system reboot.
      
      When performing a flash update, firmware is capable of responding with
      some information about the specific update requirements.
      
      The driver updates the flash by programming a secondary inactive bank
      with the contents of the new image, and then issuing a command to
      request to switch the active bank starting from the next load.
      
      The response to the final command for updating the inactive NVM flash
      bank includes an indication of the minimum reset required to fully
      update the device. This can be one of the following:
      
       * A full power on is required
       * A cold PCIe reset is required
       * An EMP reset is required
      
      The response to the command to switch flash banks includes an indication
      of whether or not the firmware will allow an EMP reset request.
      
      For most updates, an EMP reset is sufficient to load the new EMP
      firmware without issues. In some cases, this reset is not sufficient
      because the PCI configuration space has changed. When this could cause
      incompatibility with the new EMP image, the firmware is capable of
      rejecting the EMP reset request.
      
      Add logic to ice_fw_update.c to handle the response data flash update
      AdminQ commands.
      
      For the reset level, issue a devlink status notification informing the
      user of how to complete the update with a simple suggestion like
      "Activate new firmware by rebooting the system".
      
      Cache the status of whether or not firmware will restrict the EMP reset
      for use in implementing devlink reload.
      
      Implement support for devlink reload with the "fw_activate" flag. This
      allows user space to request the firmware be activated immediately.
      
      For the .reload_down handler, we will issue a request for the EMP reset
      using the appropriate firmware AdminQ command. If we know that the
      firmware will not allow an EMP reset, simply exit with a suitable
      netlink extended ACK message indicating that the EMP reset is not
      available.
      
      For the .reload_up handler, simply wait until the driver has finished
      resetting. Logic to handle processing of an EMP reset already exists in
      the driver as part of its reset and rebuild flows.
      
      Implement support for the devlink reload interface with the
      "fw_activate" action. This allows userspace to request activation of
      firmware without a reboot.
      
      Note that support for indicating the required reset and EMP reset
      restriction is not supported on old versions of firmware. The driver can
      determine if the two features are supported by checking the device
      capabilities report. I confirmed support has existed since at least
      version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the
      EMP reset request has existed in all version of the EMP firmware for the
      ice hardware.
      
      Check the device capabilities report to determine whether or not the
      indications are reported by the running firmware. If the reset
      requirement indication is not supported, always assume a full power on
      is necessary. If the reset restriction capability is not supported,
      always assume the EMP reset is available.
      
      Users can verify if the EMP reset has activated the firmware by using
      the devlink info report to check that the 'running' firmware version has
      updated. For example a user might do the following:
      
       # Check current version
       $ devlink dev info
      
       # Update the device
       $ devlink dev flash pci/0000:af:00.0 file firmware.bin
      
       # Confirm stored version updated
       $ devlink dev info
      
       # Reload to activate new firmware
       $ devlink dev reload pci/0000:af:00.0 action fw_activate
      
       # Confirm running version updated
       $ devlink dev info
      
      Finally, this change does *not* implement basic driver-only reload
      support. I did look into trying to do this. However, it requires
      significant refactor of how the ice driver probes and loads everything.
      The ice driver probe and allocation flows were not designed with such
      a reload in mind. Refactoring the flow to support this is beyond the
      scope of this change.
      
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      399e27db
    • Jacob Keller's avatar
      ice: reduce time to read Option ROM CIVD data · af18d886
      Jacob Keller authored
      
      
      During probe and device reset, the ice driver reads some data from the
      NVM image as part of ice_init_nvm. Part of this data includes a section
      of the Option ROM which contains version information.
      
      The function ice_get_orom_civd_data is used to locate the '$CIV' data
      section of the Option ROM.
      
      Timing of ice_probe and ice_rebuild indicate that the
      ice_get_orom_civd_data function takes about 10 seconds to finish
      executing.
      
      The function locates the section by scanning the Option ROM every 512
      bytes. This requires a significant number of NVM read accesses, since
      the Option ROM bank is 500KB. In the worst case it would take about 1000
      reads. Worse, all PFs serialize this operation during reload because of
      acquiring the NVM semaphore.
      
      The CIVD section is located at the end of the Option ROM image data.
      Unfortunately, the driver has no easy method to determine the offset
      manually. Practical experiments have shown that the data could be at
      a variety of locations, so simply reversing the scanning order is not
      sufficient to reduce the overall read time.
      
      Instead, copy the entire contents of the Option ROM into memory. This
      allows reading the data using 4Kb pages instead of 512 bytes at a time.
      This reduces the total number of firmware commands by a factor of 8. In
      addition, reading the whole section together at once allows better
      indication to firmware of when we're "done".
      
      Re-write ice_get_orom_civd_data to allocate virtual memory to store the
      Option ROM data. Copy the entire OptionROM contents at once using
      ice_read_flash_module. Finally, use this memory copy to scan for the
      '$CIV' section.
      
      This change significantly reduces the time to read the Option ROM CIVD
      section from ~10 seconds down to ~1 second. This has a significant
      impact on the total time to complete a driver rebuild or probe.
      
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      af18d886
    • Jacob Keller's avatar
      ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image · c9f7a483
      Jacob Keller authored
      
      
      The ice_devlink_flash_update function performs a few upfront checks and
      then calls ice_flash_pldm_image.
      
      Most if these checks make more sense in the context of code within
      ice_flash_pldm_image. Merge ice_devlink_flash_update and
      ice_flash_pldm_image into one function, placing it in ice_fw_update.c
      
      Since this is still the entry point for devlink, call the function
      ice_devlink_flash_update instead of ice_flash_pldm_image. This leaves a
      single function which handles the devlink parameters and then initiates
      a PLDM update.
      
      With this change, the ice_devlink_flash_update function in
      ice_fw_update.c becomes the main entry point for flash update. It
      elimintes some unnecessary boiler plate code between the two previous
      functions. The ultimate motivation for this is that it eases supporting
      a dry run with the PLDM library in a future change.
      
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c9f7a483