Commit 75d6d8b5 authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files

Merge branch 'devlink-mlx5-add-port-function-attributes-for-ipsec'

Saeed Mahameed says:

====================
{devlink,mlx5}: Add port function attributes for ipsec

From Dima:

Introduce hypervisor-level control knobs to set the functionality of PCI
VF devices passed through to guests. The administrator of a hypervisor
host may choose to change the settings of a port function from the
defaults configured by the device firmware.

The software stack has two types of IPsec offload - crypto and packet.
Specifically, the ip xfrm command has sub-commands for "state" and
"policy" that have an "offload" parameter. With ip xfrm state, both
crypto and packet offload types are supported, while ip xfrm policy can
only be offloaded in packet mode.

The series introduces two new boolean attributes of a port function:
ipsec_crypto and ipsec_packet. The goal is to provide a similar level of
granularity for controlling VF IPsec offload capabilities, which would
be aligned with the software model. This will allow users to decide if
they want both types of offload enabled for a VF, just one of them, or
none at all (which is the default).

At a high level, the difference between the two knobs is that with
ipsec_crypto, only XFRM state can be offloaded. Specifically, only the
crypto operation (Encrypt/Decrypt) is offloaded. With ipsec_packet, both
XFRM state and policy can be offloaded. Furthermore, in addition to
crypto operation offload, IPsec encapsulation is also offloaded. For
XFRM state, choosing between crypto and packet offload types is
possible. From the HW perspective, different resources may be required
for each offload type.

Examples of when a user prefers to enable IPsec packet offload for a VF
when using switchdev mode:

  $ devlink port show pci/0000:06:00.0/1
      pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
          function:
          hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet disable

  $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable

  $ devlink port show pci/0000:06:00.0/1
      pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
          function:
          hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet enable

This enables the corresponding IPsec capability of the function before
it's enumerated, so when the driver reads the capability from the device
firmware, it is enabled. The driver is then able to configure
corresponding features and ops of the VF net device to support IPsec
state and policy offloading.

v2: https://lore.kernel.org/netdev/20230421104901.897946-1-dchumak@nvidia.com/
====================

Link: https://lore.kernel.org/r/20230825062836.103744-1-saeed@kernel.org


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents aa05346d b691b111
Loading
Loading
Loading
Loading
+20 −0
Original line number Diff line number Diff line
@@ -190,6 +190,26 @@ explicitly enable the VF migratable capability.
mlx5 driver support devlink port function attr mechanism to setup migratable
capability. (refer to Documentation/networking/devlink/devlink-port.rst)

IPsec crypto capability setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
User who wants mlx5 PCI VFs to be able to perform IPsec crypto offloading need
to explicitly enable the VF ipsec_crypto capability. Enabling IPsec capability
for VFs is supported starting with ConnectX6dx devices and above. When a VF has
IPsec capability enabled, any IPsec offloading is blocked on the PF.

mlx5 driver support devlink port function attr mechanism to setup ipsec_crypto
capability. (refer to Documentation/networking/devlink/devlink-port.rst)

IPsec packet capability setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
User who wants mlx5 PCI VFs to be able to perform IPsec packet offloading need
to explicitly enable the VF ipsec_packet capability. Enabling IPsec capability
for VFs is supported starting with ConnectX6dx devices and above. When a VF has
IPsec capability enabled, any IPsec offloading is blocked on the PF.

mlx5 driver support devlink port function attr mechanism to setup ipsec_packet
capability. (refer to Documentation/networking/devlink/devlink-port.rst)

SF state setup
--------------

+55 −0
Original line number Diff line number Diff line
@@ -128,6 +128,12 @@ Users may also set the RoCE capability of the function using
Users may also set the function as migratable using
'devlink port function set migratable' command.

Users may also set the IPsec crypto capability of the function using
`devlink port function set ipsec_crypto` command.

Users may also set the IPsec packet capability of the function using
`devlink port function set ipsec_packet` command.

Function attributes
===================

@@ -240,6 +246,55 @@ Attach VF to the VM.
Start the VM.
Perform live migration.

IPsec crypto capability setup
-----------------------------
When user enables IPsec crypto capability for a VF, user application can offload
XFRM state crypto operation (Encrypt/Decrypt) to this VF.

When IPsec crypto capability is disabled (default) for a VF, the XFRM state is
processed in software by the kernel.

- Get IPsec crypto capability of the VF device::

    $ devlink port show pci/0000:06:00.0/2
    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
        function:
            hw_addr 00:00:00:00:00:00 ipsec_crypto disabled

- Set IPsec crypto capability of the VF device::

    $ devlink port function set pci/0000:06:00.0/2 ipsec_crypto enable

    $ devlink port show pci/0000:06:00.0/2
    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
        function:
            hw_addr 00:00:00:00:00:00 ipsec_crypto enabled

IPsec packet capability setup
-----------------------------
When user enables IPsec packet capability for a VF, user application can offload
XFRM state and policy crypto operation (Encrypt/Decrypt) to this VF, as well as
IPsec encapsulation.

When IPsec packet capability is disabled (default) for a VF, the XFRM state and
policy is processed in software by the kernel.

- Get IPsec packet capability of the VF device::

    $ devlink port show pci/0000:06:00.0/2
    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
        function:
            hw_addr 00:00:00:00:00:00 ipsec_packet disabled

- Set IPsec packet capability of the VF device::

    $ devlink port function set pci/0000:06:00.0/2 ipsec_packet enable

    $ devlink port show pci/0000:06:00.0/2
    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
        function:
            hw_addr 00:00:00:00:00:00 ipsec_packet enabled

Subfunction
============

+1 −1
Original line number Diff line number Diff line
@@ -69,7 +69,7 @@ mlx5_core-$(CONFIG_MLX5_TC_SAMPLE) += en/tc/sample.o
#
mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o eswitch_offloads.o eswitch_offloads_termtbl.o \
				      ecpf.o rdma.o esw/legacy.o \
				      esw/devlink_port.o esw/vporttbl.o esw/qos.o
				      esw/devlink_port.o esw/vporttbl.o esw/qos.o esw/ipsec.o

mlx5_core-$(CONFIG_MLX5_ESWITCH)   += esw/acl/helper.o \
				      esw/acl/egress_lgcy.o esw/acl/egress_ofld.o \
+19 −1
Original line number Diff line number Diff line
@@ -38,6 +38,7 @@
#include <net/netevent.h>

#include "en.h"
#include "eswitch.h"
#include "ipsec.h"
#include "ipsec_rxtx.h"
#include "en_rep.h"
@@ -670,6 +671,11 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
	if (err)
		goto err_xfrm;

	if (!mlx5_eswitch_block_ipsec(priv->mdev)) {
		err = -EBUSY;
		goto err_xfrm;
	}

	/* check esn */
	if (x->props.flags & XFRM_STATE_ESN)
		mlx5e_ipsec_update_esn_state(sa_entry);
@@ -678,7 +684,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,

	err = mlx5_ipsec_create_work(sa_entry);
	if (err)
		goto err_xfrm;
		goto unblock_ipsec;

	err = mlx5e_ipsec_create_dwork(sa_entry);
	if (err)
@@ -735,6 +741,8 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
	if (sa_entry->work)
		kfree(sa_entry->work->data);
	kfree(sa_entry->work);
unblock_ipsec:
	mlx5_eswitch_unblock_ipsec(priv->mdev);
err_xfrm:
	kfree(sa_entry);
	NL_SET_ERR_MSG_WEAK_MOD(extack, "Device failed to offload this state");
@@ -764,6 +772,7 @@ static void mlx5e_xfrm_del_state(struct xfrm_state *x)
static void mlx5e_xfrm_free_state(struct xfrm_state *x)
{
	struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
	struct mlx5e_ipsec *ipsec = sa_entry->ipsec;

	if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
		goto sa_entry_free;
@@ -780,6 +789,7 @@ static void mlx5e_xfrm_free_state(struct xfrm_state *x)
	if (sa_entry->work)
		kfree(sa_entry->work->data);
	kfree(sa_entry->work);
	mlx5_eswitch_unblock_ipsec(ipsec->mdev);
sa_entry_free:
	kfree(sa_entry);
}
@@ -1055,6 +1065,11 @@ static int mlx5e_xfrm_add_policy(struct xfrm_policy *x,
	pol_entry->x = x;
	pol_entry->ipsec = priv->ipsec;

	if (!mlx5_eswitch_block_ipsec(priv->mdev)) {
		err = -EBUSY;
		goto ipsec_busy;
	}

	mlx5e_ipsec_build_accel_pol_attrs(pol_entry, &pol_entry->attrs);
	err = mlx5e_accel_ipsec_fs_add_pol(pol_entry);
	if (err)
@@ -1064,6 +1079,8 @@ static int mlx5e_xfrm_add_policy(struct xfrm_policy *x,
	return 0;

err_fs:
	mlx5_eswitch_unblock_ipsec(priv->mdev);
ipsec_busy:
	kfree(pol_entry);
	NL_SET_ERR_MSG_MOD(extack, "Device failed to offload this policy");
	return err;
@@ -1074,6 +1091,7 @@ static void mlx5e_xfrm_del_policy(struct xfrm_policy *x)
	struct mlx5e_ipsec_pol_entry *pol_entry = to_ipsec_pol_entry(x);

	mlx5e_accel_ipsec_fs_del_pol(pol_entry);
	mlx5_eswitch_unblock_ipsec(pol_entry->ipsec->mdev);
}

static void mlx5e_xfrm_free_policy(struct xfrm_policy *x)
+24 −39
Original line number Diff line number Diff line
@@ -254,6 +254,8 @@ static void rx_destroy(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec,
	mlx5_del_flow_rules(rx->sa.rule);
	mlx5_destroy_flow_group(rx->sa.group);
	mlx5_destroy_flow_table(rx->ft.sa);
	if (rx->allow_tunnel_mode)
		mlx5_eswitch_unblock_encap(mdev);
	if (rx == ipsec->rx_esw) {
		mlx5_esw_ipsec_rx_status_destroy(ipsec, rx);
	} else {
@@ -357,6 +359,8 @@ static int rx_create(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec,
		goto err_add;

	/* Create FT */
	if (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_TUNNEL)
		rx->allow_tunnel_mode = mlx5_eswitch_block_encap(mdev);
	if (rx->allow_tunnel_mode)
		flags = MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT;
	ft = ipsec_ft_create(attr.ns, attr.sa_level, attr.prio, 2, flags);
@@ -411,6 +415,8 @@ static int rx_create(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec,
err_fs:
	mlx5_destroy_flow_table(rx->ft.sa);
err_fs_ft:
	if (rx->allow_tunnel_mode)
		mlx5_eswitch_unblock_encap(mdev);
	mlx5_del_flow_rules(rx->status.rule);
	mlx5_modify_header_dealloc(mdev, rx->status.modify_hdr);
err_add:
@@ -428,26 +434,19 @@ static int rx_get(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec,
	if (rx->ft.refcnt)
		goto skip;

	if (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_TUNNEL)
		rx->allow_tunnel_mode = mlx5_eswitch_block_encap(mdev);

	err = mlx5_eswitch_block_mode_trylock(mdev);
	err = mlx5_eswitch_block_mode(mdev);
	if (err)
		goto err_out;
		return err;

	err = rx_create(mdev, ipsec, rx, family);
	mlx5_eswitch_block_mode_unlock(mdev, err);
	if (err)
		goto err_out;
	if (err) {
		mlx5_eswitch_unblock_mode(mdev);
		return err;
	}

skip:
	rx->ft.refcnt++;
	return 0;

err_out:
	if (rx->allow_tunnel_mode)
		mlx5_eswitch_unblock_encap(mdev);
	return err;
}

static void rx_put(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_rx *rx,
@@ -456,12 +455,8 @@ static void rx_put(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_rx *rx,
	if (--rx->ft.refcnt)
		return;

	mlx5_eswitch_unblock_mode_lock(ipsec->mdev);
	rx_destroy(ipsec->mdev, ipsec, rx, family);
	mlx5_eswitch_unblock_mode_unlock(ipsec->mdev);

	if (rx->allow_tunnel_mode)
		mlx5_eswitch_unblock_encap(ipsec->mdev);
	mlx5_eswitch_unblock_mode(ipsec->mdev);
}

static struct mlx5e_ipsec_rx *rx_ft_get(struct mlx5_core_dev *mdev,
@@ -581,6 +576,8 @@ static void tx_destroy(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_tx *tx,
		mlx5_destroy_flow_group(tx->sa.group);
	}
	mlx5_destroy_flow_table(tx->ft.sa);
	if (tx->allow_tunnel_mode)
		mlx5_eswitch_unblock_encap(ipsec->mdev);
	mlx5_del_flow_rules(tx->status.rule);
	mlx5_destroy_flow_table(tx->ft.status);
}
@@ -621,6 +618,8 @@ static int tx_create(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_tx *tx,
	if (err)
		goto err_status_rule;

	if (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_TUNNEL)
		tx->allow_tunnel_mode = mlx5_eswitch_block_encap(mdev);
	if (tx->allow_tunnel_mode)
		flags = MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT;
	ft = ipsec_ft_create(tx->ns, attr.sa_level, attr.prio, 4, flags);
@@ -687,6 +686,8 @@ static int tx_create(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_tx *tx,
err_sa_miss:
	mlx5_destroy_flow_table(tx->ft.sa);
err_sa_ft:
	if (tx->allow_tunnel_mode)
		mlx5_eswitch_unblock_encap(mdev);
	mlx5_del_flow_rules(tx->status.rule);
err_status_rule:
	mlx5_destroy_flow_table(tx->ft.status);
@@ -720,32 +721,22 @@ static int tx_get(struct mlx5_core_dev *mdev, struct mlx5e_ipsec *ipsec,
	if (tx->ft.refcnt)
		goto skip;

	if (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_TUNNEL)
		tx->allow_tunnel_mode = mlx5_eswitch_block_encap(mdev);

	err = mlx5_eswitch_block_mode_trylock(mdev);
	err = mlx5_eswitch_block_mode(mdev);
	if (err)
		goto err_out;
		return err;

	err = tx_create(ipsec, tx, ipsec->roce);
	if (err) {
		mlx5_eswitch_block_mode_unlock(mdev, err);
		goto err_out;
		mlx5_eswitch_unblock_mode(mdev);
		return err;
	}

	if (tx == ipsec->tx_esw)
		ipsec_esw_tx_ft_policy_set(mdev, tx->ft.pol);

	mlx5_eswitch_block_mode_unlock(mdev, err);

skip:
	tx->ft.refcnt++;
	return 0;

err_out:
	if (tx->allow_tunnel_mode)
		mlx5_eswitch_unblock_encap(mdev);
	return err;
}

static void tx_put(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_tx *tx)
@@ -753,19 +744,13 @@ static void tx_put(struct mlx5e_ipsec *ipsec, struct mlx5e_ipsec_tx *tx)
	if (--tx->ft.refcnt)
		return;

	mlx5_eswitch_unblock_mode_lock(ipsec->mdev);

	if (tx == ipsec->tx_esw) {
		mlx5_esw_ipsec_restore_dest_uplink(ipsec->mdev);
		ipsec_esw_tx_ft_policy_set(ipsec->mdev, NULL);
	}

	tx_destroy(ipsec, tx, ipsec->roce);

	mlx5_eswitch_unblock_mode_unlock(ipsec->mdev);

	if (tx->allow_tunnel_mode)
		mlx5_eswitch_unblock_encap(ipsec->mdev);
	mlx5_eswitch_unblock_mode(ipsec->mdev);
}

static struct mlx5_flow_table *tx_ft_get_policy(struct mlx5_core_dev *mdev,
Loading