Skip to content
  1. Mar 27, 2020
  2. Mar 26, 2020
    • Parav Pandit's avatar
      net/mlx5: E-switch, Protect eswitch mode changes · 8e0aa4bc
      Parav Pandit authored
      
      
      Currently eswitch mode change is occurring from 2 different execution
      contexts as below.
      1. sriov sysfs enable/disable
      2. devlink eswitch set commands
      
      Both of them need to access eswitch related data structures in
      synchronized manner.
      Without any synchronization below race condition exist.
      
      SR-IOV enable/disable with devlink eswitch mode change:
      
                cpu-0                         cpu-1
                -----                         -----
      mlx5_device_disable_sriov()        mlx5_devlink_eswitch_mode_set()
        mlx5_eswitch_disable()             esw_offloads_stop()
          esw_offloads_disable()             mlx5_eswitch_disable()
                                               esw_offloads_disable()
      
      Hence, they are synchronized using a new mode_lock.
      eswitch's state_lock is not used as it can lead to a deadlock scenario
      below and state_lock is only for vport and fdb exclusive access.
      
      ip link set vf <param>
        netlink rcv_msg() - Lock A
          rtnl_lock
          vfinfo()
            esw->state_lock() - Lock B
      devlink eswitch_set
         devlink_mutex
           esw->state_lock() - Lock B
             attach_netdev()
               register_netdev()
                 rtnl_lock - Lock A
      
      Alternatives considered:
      1. Acquiring rtnl lock before taking esw->state_lock to follow similar
      locking sequence as ip link flow during eswitch mode set.
      rtnl lock is not good idea for two reasons.
      (a) Holding rtnl lock for several hundred device commands is not good
          idea.
      (b) It leads to below and more similar deadlocks.
      
      devlink eswitch_set
         devlink_mutex
           rtnl_lock - Lock A
             esw->state_lock() - Lock B
               eswitch_disable()
                 reload()
                   ib_register_device()
                     ib_cache_setup_one()
                       rtnl_lock()
      
      2. Exporting devlink lock may lead to undesired use of it in vendor
      driver(s) in future.
      
      3. Unloading representors outside of the mode_lock requires
      serialization with other process trying to enable the eswitch.
      
      4. Differing the representors life cycle to a different workqueue
      requires synchronization with func_change_handler workqueue.
      
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarBodong Wang <bodong@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8e0aa4bc
    • Parav Pandit's avatar
      net/mlx5: E-switch, Extend eswitch enable to handle num_vfs change · ebf77bb8
      Parav Pandit authored
      
      
      Subsequent patch protects eswitch mode changes across sriov and devlink
      interfaces. It is desirable for eswitch to provide thread safe eswitch
      enable and disable APIs.
      Hence, extend eswitch enable API to optionally update num_vfs when
      requested.
      
      In subsequent patch, eswitch num_vfs are updated after all the eswitch
      users eswitch drops its reference count.
      
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarBodong Wang <bodong@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ebf77bb8
    • Parav Pandit's avatar
      net/mlx5: Split eswitch mode check to different helper function · ae24432c
      Parav Pandit authored
      
      
      In order to check eswitch state under a lock, prepare code to split
      capability check and eswitch state check into two helper functions.
      
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarBodong Wang <bodong@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ae24432c
    • Parav Pandit's avatar
      devlink: Rely on driver eswitch thread safety instead of devlink · 98fed6eb
      Parav Pandit authored
      
      
      devlink_nl_cmd_eswitch_set_doit() doesn't hold devlink->lock mutex while
      invoking driver callback. This is likely due to eswitch mode setting
      involves adding/remove devlink ports, health reporters or
      other devlink objects for a devlink device.
      
      So it is driver responsiblity to ensure thread safe eswitch state
      transition happening via either sriov legacy enablement or via devlink
      eswitch set callback.
      
      Therefore, get() callback should also be invoked without holding
      devlink->lock mutex.
      Vendor driver can use same internal lock which it uses during eswitch
      mode set() callback.
      This makes get() and set() implimentation symmetric in devlink core and
      in vendor drivers.
      
      Hence, remove holding devlink->lock mutex during eswitch get() callback.
      
      Failing to do so results into below deadlock scenario when mlx5_core
      driver is improved to handle eswitch mode set critical section invoked
      by devlink and sriov sysfs interface in subsequent patch.
      
      devlink_nl_cmd_eswitch_set_doit()
         mlx5_eswitch_mode_set()
           mutex_lock(esw->mode_lock) <- Lock A
           [...]
           register_devlink_port()
             mutex_lock(&devlink->lock); <- lock B
      
      mutex_lock(&devlink->lock); <- lock B
      devlink_nl_cmd_eswitch_get_doit()
         mlx5_eswitch_mode_get()
         mutex_lock(esw->mode_lock) <- Lock A
      
      In subsequent patch, mlx5_core driver uses its internal lock during
      get() and set() eswitch callbacks.
      
      Other drivers have been inspected which returns either constant during
      get operations or reads the value from already allocated structure.
      Hence it is safe to remove the lock in get( ) callback and let vendor
      driver handle it.
      
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      98fed6eb
    • Parav Pandit's avatar
      net/mlx5: Simplify mlx5_unload_one() and its callers · f999b706
      Parav Pandit authored
      
      
      mlx5_unload_one() always returns 0.
      Simplify callers of mlx5_unload_one() and remove the dead code.
      
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f999b706
    • Parav Pandit's avatar
      net/mlx5: Simplify mlx5_register_device to return void · ecd01db8
      Parav Pandit authored
      
      
      mlx5_register_device() doesn't check for any error and always returns 0.
      Simplify mlx5_register_device() to return void and its caller, remove
      dead code related to it.
      
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ecd01db8