Commit fbc1449d authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files

Merge tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-04-20

1) Dragos Improves RX page pool, and provides some fixes to his previous
   series:
 1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case
 1.2) Hook NAPIs to page pools to gain more performance.

2) From Roi, Some cleanups to TC and eswitch modules.

3) Maher migrates vnic diagnostic counters reporting from debugfs to a
    dedicated devlink health reporter

Maher Says:
===========
 net/mlx5: Expose vnic diagnostic counters using devlink

Currently, vnic diagnostic counters are exposed through the following
debugfs:

$ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/
cq_overrun
quota_exceeded_command
total_q_under_processor_handle
invalid_command
send_queue_priority_update_flow
nic_receive_steering_discard

The current design does not allow the hypervisor to view the diagnostic
counters of its VFs, in case the VFs get bound to a VM. In other words,
the counters are not exposed for representor interfaces.
Furthermore, the debugfs design is inconvenient future-wise, in case more
counters need to be reported by the driver in the future.

As these counters pertain to vNIC health, it is more appropriate to
utilize the devlink health reporter to expose them.

Thus, this patchest includes the following changes:

* Drop the current vnic diagnostic counters debugfs interface.
* Add a vnic devlink health reporter for PFs/VFs core devices, which
  when diagnosed will dump vnic diagnostic counter values that are
  queried from FW.
* Add a vnic devlink health reporter for the representor interface, which
  serves the same purpose listed in the previous point, in addition to
  allowing the hypervisor to view its VFs diagnostic counters, even when
  the VFs are bounded to external VMs.

Example of devlink health reporter usage is:
$devlink health diagnose pci/0000:08:00.0 reporter vnic
 vNIC env counters:
    total_error_queues: 0 send_queue_priority_update_flow: 0
    comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
    invalid_command: 0 quota_exceeded_command: 0
    nic_receive_steering_discard: 0

===========

4) SW steering fixes and improvements

Yevgeny Kliteynik Says:
=======================
These short patch series are just small fixes / improvements for
SW steering:

 - Patch 1: Fix dumping of legacy modify_hdr in debug dump to
   align to what is expected by parser
 - Patch 2: Have separate threshold for ICM sync per ICM type
 - Patch 3: Add more info to the steering debug dump - Linux
   version and device name
 - Patch 4: Keep track of number of buddies that are currently
   in use per domain per buddy type

=======================

* tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Update op_mode to op_mod for port selection
  net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set()
  net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc()
  net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn()
  net/mlx5e: RX, Hook NAPIs to page pools
  net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case
  net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ
  net/mlx5e: Add vnic devlink health reporter to representors
  net/mlx5: Add vnic devlink health reporter to PFs/VFs
  Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"
  Revert "net/mlx5: Expose steering dropped packets counter"
  net/mlx5: DR, Add memory statistics for domain object
  net/mlx5: DR, Add more info in domain dbg dump
  net/mlx5: DR, Calculate sync threshold of each pool according to its type
  net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump
====================

Link: https://lore.kernel.org/r/20230421013850.349646-1-saeed@kernel.org


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents 9a82cdc2 f9c895a7
Loading
Loading
Loading
Loading
+33 −0
Original line number Diff line number Diff line
@@ -257,3 +257,36 @@ User commands examples:
    $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal

NOTE: This command can run only on PF.

vnic reporter
-------------
The vnic reporter implements only the `diagnose` callback.
It is responsible for querying the vnic diagnostic counters from fw and displaying
them in realtime.

Description of the vnic counters:
total_q_under_processor_handle: number of queues in an error state due to
an async error or errored command.
send_queue_priority_update_flow: number of QP/SQ priority/SL update
events.
cq_overrun: number of times CQ entered an error state due to an
overflow.
async_eq_overrun: number of times an EQ mapped to async events was
overrun.
comp_eq_overrun: number of times an EQ mapped to completion events was
overrun.
quota_exceeded_command: number of commands issued and failed due to quota
exceeded.
invalid_command: number of commands issued and failed dues to any reason
other than quota exceeded.
nic_receive_steering_discard: number of packets that completed RX flow
steering but were discarded due to a mismatch in flow table.

User commands examples:
- Diagnose PF/VF vnic counters
        $ devlink health diagnose pci/0000:82:00.1 reporter vnic
- Diagnose representor vnic counters (performed by supplying devlink port of the
  representor, which can be obtained via devlink port command)
        $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic

NOTE: This command can run over all interfaces such as PF/VF and representor ports.
+2 −2
Original line number Diff line number Diff line
@@ -16,7 +16,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
		transobj.o vport.o sriov.o fs_cmd.o fs_core.o pci_irq.o \
		fs_counters.o fs_ft_pool.o rl.o lag/debugfs.o lag/lag.o dev.o events.o wq.o lib/gid.o \
		lib/devcom.o lib/pci_vsc.o lib/dm.o lib/fs_ttc.o diag/fs_tracepoint.o \
		diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o \
		diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o diag/reporter_vnic.o \
		fw_reset.o qos.o lib/tout.o lib/aso.o

#
@@ -69,7 +69,7 @@ mlx5_core-$(CONFIG_MLX5_TC_SAMPLE) += en/tc/sample.o
#
mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o eswitch_offloads.o eswitch_offloads_termtbl.o \
				      ecpf.o rdma.o esw/legacy.o \
				      esw/debugfs.o esw/devlink_port.o esw/vporttbl.o esw/qos.o
				      esw/devlink_port.o esw/vporttbl.o esw/qos.o

mlx5_core-$(CONFIG_MLX5_ESWITCH)   += esw/acl/helper.o \
				      esw/acl/egress_lgcy.o esw/acl/egress_ofld.o \
+125 −0
Original line number Diff line number Diff line
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. */

#include "reporter_vnic.h"
#include "devlink.h"

#define VNIC_ENV_GET64(vnic_env_stats, c) \
	MLX5_GET64(query_vnic_env_out, (vnic_env_stats)->query_vnic_env_out, \
		 vport_env.c)

struct mlx5_vnic_diag_stats {
	__be64 query_vnic_env_out[MLX5_ST_SZ_QW(query_vnic_env_out)];
};

int mlx5_reporter_vnic_diagnose_counters(struct mlx5_core_dev *dev,
					 struct devlink_fmsg *fmsg,
					 u16 vport_num, bool other_vport)
{
	u32 in[MLX5_ST_SZ_DW(query_vnic_env_in)] = {};
	struct mlx5_vnic_diag_stats vnic;
	int err;

	MLX5_SET(query_vnic_env_in, in, opcode, MLX5_CMD_OP_QUERY_VNIC_ENV);
	MLX5_SET(query_vnic_env_in, in, vport_number, vport_num);
	MLX5_SET(query_vnic_env_in, in, other_vport, !!other_vport);

	err = mlx5_cmd_exec_inout(dev, query_vnic_env, in, &vnic.query_vnic_env_out);
	if (err)
		return err;

	err = devlink_fmsg_pair_nest_start(fmsg, "vNIC env counters");
	if (err)
		return err;

	err = devlink_fmsg_obj_nest_start(fmsg);
	if (err)
		return err;

	err = devlink_fmsg_u64_pair_put(fmsg, "total_error_queues",
					VNIC_ENV_GET64(&vnic, total_error_queues));
	if (err)
		return err;

	err = devlink_fmsg_u64_pair_put(fmsg, "send_queue_priority_update_flow",
					VNIC_ENV_GET64(&vnic, send_queue_priority_update_flow));
	if (err)
		return err;

	err = devlink_fmsg_u64_pair_put(fmsg, "comp_eq_overrun",
					VNIC_ENV_GET64(&vnic, comp_eq_overrun));
	if (err)
		return err;

	err = devlink_fmsg_u64_pair_put(fmsg, "async_eq_overrun",
					VNIC_ENV_GET64(&vnic, async_eq_overrun));
	if (err)
		return err;

	err = devlink_fmsg_u64_pair_put(fmsg, "cq_overrun",
					VNIC_ENV_GET64(&vnic, cq_overrun));
	if (err)
		return err;

	err = devlink_fmsg_u64_pair_put(fmsg, "invalid_command",
					VNIC_ENV_GET64(&vnic, invalid_command));
	if (err)
		return err;

	err = devlink_fmsg_u64_pair_put(fmsg, "quota_exceeded_command",
					VNIC_ENV_GET64(&vnic, quota_exceeded_command));
	if (err)
		return err;

	err = devlink_fmsg_u64_pair_put(fmsg, "nic_receive_steering_discard",
					VNIC_ENV_GET64(&vnic, nic_receive_steering_discard));
	if (err)
		return err;

	err = devlink_fmsg_obj_nest_end(fmsg);
	if (err)
		return err;

	err = devlink_fmsg_pair_nest_end(fmsg);
	if (err)
		return err;

	return 0;
}

static int mlx5_reporter_vnic_diagnose(struct devlink_health_reporter *reporter,
				       struct devlink_fmsg *fmsg,
				       struct netlink_ext_ack *extack)
{
	struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);

	return mlx5_reporter_vnic_diagnose_counters(dev, fmsg, 0, false);
}

static const struct devlink_health_reporter_ops mlx5_reporter_vnic_ops = {
	.name = "vnic",
	.diagnose = mlx5_reporter_vnic_diagnose,
};

void mlx5_reporter_vnic_create(struct mlx5_core_dev *dev)
{
	struct mlx5_core_health *health = &dev->priv.health;
	struct devlink *devlink = priv_to_devlink(dev);

	health->vnic_reporter =
		devlink_health_reporter_create(devlink,
					       &mlx5_reporter_vnic_ops,
					       0, dev);
	if (IS_ERR(health->vnic_reporter))
		mlx5_core_warn(dev,
			       "Failed to create vnic reporter, err = %ld\n",
			       PTR_ERR(health->vnic_reporter));
}

void mlx5_reporter_vnic_destroy(struct mlx5_core_dev *dev)
{
	struct mlx5_core_health *health = &dev->priv.health;

	if (!IS_ERR_OR_NULL(health->vnic_reporter))
		devlink_health_reporter_destroy(health->vnic_reporter);
}
+16 −0
Original line number Diff line number Diff line
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES.
 */
#ifndef __MLX5_REPORTER_VNIC_H
#define __MLX5_REPORTER_VNIC_H

#include "mlx5_core.h"

void mlx5_reporter_vnic_create(struct mlx5_core_dev *dev);
void mlx5_reporter_vnic_destroy(struct mlx5_core_dev *dev);

int mlx5_reporter_vnic_diagnose_counters(struct mlx5_core_dev *dev,
					 struct devlink_fmsg *fmsg,
					 u16 vport_num, bool other_vport);

#endif /* __MLX5_REPORTER_VNIC_H */
+1 −0
Original line number Diff line number Diff line
@@ -857,6 +857,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
		pp_params.pool_size = pool_size;
		pp_params.nid       = node;
		pp_params.dev       = rq->pdev;
		pp_params.napi      = rq->cq.napi;
		pp_params.dma_dir   = rq->buff.map_dir;
		pp_params.max_len   = PAGE_SIZE;

Loading