Commit 356ae88f authored Jul 23, 2021 by David S. Miller

Merge branch 'bridge-tx-fwd'

Vladimir Oltean says:

====================
Allow TX forwarding for the software bridge data path to be offloaded to capable devices

On RX, switchdev drivers have the ability to mark packets for the
software bridge as "already forwarded in hardware" via
skb->offload_fwd_mark. This instructs the nbp_switchdev_allowed_egress()
function to perform software forwarding of that packet only to the bridge
ports that are not in the same hardware domain as the source packet.

This series expands the concept for TX, in the sense that we can trust
the accelerator to:
(a) look up its FDB (which is more or less in sync with the software
    bridge FDB) for selecting the destination ports for a packet
(b) replicate the frame in hardware in case it's a multicast/broadcast,
    instead of the software bridge having to clone it and send the
    clones to each net device one at a time. This reduces the bandwidth
    needed between the CPU and the accelerator, as well as the CPU time
    spent.

This is done by augmenting nbp_switchdev_allowed_egress() to also
exclude the bridge ports which have the tx_fwd_offload capability if the
skb has already been transmitted to one port from their hardware domain.

Even though in reality, the software bridge still technically looks up
the FDB/MDB for every frame, but all skb clones are suppressed, this
offload specifically requires that the switchdev accelerator looks up
its FDB/MDB again. It is intended to be used to inject "data plane
packets" into the hardware as opposed to "control plane packets" which
target a precise destination port.

Towards that goal, the bridge always provides the TX packets with
skb->offload_fwd_mark = true with the VLAN tag always present, so that
the accelerator can forward according to that VLAN broadcast domain.

This work is not intended to cater to switches which can inject control
plane packets to a bit mask of destination ports. I see that as a more
difficult task to accomplish with potentially less benefits (it provides
only replication offload). The reason it is more difficult is that
struct skb_buff would probably need to be extended to contain a list of
struct net_devices that the packet must be replicated to. Sending data
plane packets avoids that issue by keeping the hardware and software FDB
more or less in sync and looking it up twice.

Additionally, the ability for the software bridge to request data plane
packets to be sent brings the opportunity for "dumb switches" to support
traffic termination to/from the bridge. Such switches (DSA or otherwise)
typically only use control packets for link-local traps, and sending or
receiving a control packet is an expensive operation.

For this class of switches, this patch series makes the difference
between supporting and not supporting local IP termination through a
VLAN-aware bridge, bridging with a foreign interface, bridging with
software upper interfaces like LAG, etc. So instead of telling them
"oh, what a dumb switch you are!", we can now tell them "oh, what a
stark contrast you have between the control and data plane!".

Patches 1-3 tested on Turris MOX (3 mv88e6xxx switches in a daisy chain
topology) and a second DSA driver to be added soon. Patches 4-5 tested
only on Turris MOX.

===========================================================

Changes in v5:
- make sure the static key is decremented on bridge port unoffload
- rename functions and variables so that the "tx_fwd_offload" string is
  easy to grep across the git tree
- simplify DSA core bookkeeping of the bridge_num

===========================================================

Changes in v4:

The biggest change compared to the previous series is not present in the
patches, but is rather a lack of them. Previously we were replaying
switchdev objects on the public notifier chain, but that was a mistake
in my reasoning and it was reverted for v4. Therefore, we are now
passing the notifier blocks as arguments to switchdev_bridge_port_offload()
for all drivers. This alone gets rid of 7 patches compared to v3.

Other changes are:
- Take more care for the case where mlxsw leaves a VLAN or LAG upper
  that is a bridge port, make sure that switchdev_bridge_port_unoffload()
  gets called for that case
- A couple of DSA bug fixes
- Add change logs for all patches
- Copy all switchdev driver maintainers on the changes relevant to them

===========================================================

Message for v3:
https://patchwork.kernel.org/project/netdevbpf/cover/20210712152142.800651-1-vladimir.oltean@nxp.com/

In this submission I have introduced a "native switchdev" driver API to
signal whether the TX forwarding offload is supported or not. This comes
after a third person has said that the macvlan offload framework used
for v2 and v1 was simply too convoluted.

This large patch set is submitted for discussion purposes (it is
provided in its entirety so it can be applied & tested on net-next).
It is only minimally tested, and yet I will not copy all switchdev
driver maintainers until we agree on the viability of this approach.

The major changes compared to v2:
- The introduction of switchdev_bridge_port_offload() and
  switchdev_bridge_port_unoffload() as two major API changes from the
  perspective of a switchdev driver. All drivers were converted to call
  these.
- Augment switchdev_bridge_port_{,un}offload to also handle the
  switchdev object replays on port join/leave.
- Augment switchdev_bridge_port_offload to also signal whether the TX
  forwarding offload is supported.

===========================================================

Message for v2:
https://patchwork.kernel.org/project/netdevbpf/cover/20210703115705.1034112-1-vladimir.oltean@nxp.com/

For this series I have taken Tobias' work from here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210426170411.1789186-1-tobias@waldekranz.com/


and made the following changes:
- I collected and integrated (hopefully all of) Nikolay's, Ido's and my
  feedback on the bridge driver changes. Otherwise, the structure of the
  bridge changes is pretty much the same as Tobias left it.
- I basically rewrote the DSA infrastructure for the data plane
  forwarding offload, based on the commonalities with another switch
  driver for which I implemented this feature (not submitted here)
- I adapted mv88e6xxx to use the new infrastructure, hopefully it still
  works but I didn't test that
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

parents 5af84df9 d82f8ab0

drivers/net/dsa/mv88e6xxx/chip.c

+74 −4

Original line number	Diff line number	Diff line
		@@ -1221,14 +1221,36 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port)
		bool found = false;
		u16 pvlan;

		/* dev is a physical switch */
		if (dev <= dst->last_switch) {
		list_for_each_entry(dp, &dst->ports, list) {
		if (dp->ds->index == dev && dp->index == port) {
		/* dp might be a DSA link or a user port, so it
		* might or might not have a bridge_dev
		* pointer. Use the "found" variable for both
		* cases.
		*/
		br = dp->bridge_dev;
		found = true;
		break;
		}
		}
		/* dev is a virtual bridge */
		} else {
		list_for_each_entry(dp, &dst->ports, list) {
		if (dp->bridge_num < 0)
		continue;

		/* Prevent frames from unknown switch or port */
		if (dp->bridge_num + 1 + dst->last_switch != dev)
		continue;

		br = dp->bridge_dev;
		found = true;
		break;
		}
		}

		/* Prevent frames from unknown switch or virtual bridge */
		if (!found)
		return 0;

		@@ -1236,7 +1258,6 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port)
		if (dp->type == DSA_PORT_TYPE_CPU \|\| dp->type == DSA_PORT_TYPE_DSA)
		return mv88e6xxx_port_mask(chip);

		br = dp->bridge_dev;
		pvlan = 0;

		/* Frames from user ports can egress any local DSA links and CPU ports,
		@@ -2422,6 +2443,44 @@ static void mv88e6xxx_crosschip_bridge_leave(struct dsa_switch *ds,
		mv88e6xxx_reg_unlock(chip);
		}

		/* Treat the software bridge as a virtual single-port switch behind the
		* CPU and map in the PVT. First dst->last_switch elements are taken by
		* physical switches, so start from beyond that range.
		*/
		static int mv88e6xxx_map_virtual_bridge_to_pvt(struct dsa_switch *ds,
		int bridge_num)
		{
		u8 dev = bridge_num + ds->dst->last_switch + 1;
		struct mv88e6xxx_chip *chip = ds->priv;
		int err;

		mv88e6xxx_reg_lock(chip);
		err = mv88e6xxx_pvt_map(chip, dev, 0);
		mv88e6xxx_reg_unlock(chip);

		return err;
		}

		static int mv88e6xxx_bridge_tx_fwd_offload(struct dsa_switch *ds, int port,
		struct net_device *br,
		int bridge_num)
		{
		return mv88e6xxx_map_virtual_bridge_to_pvt(ds, bridge_num);
		}

		static void mv88e6xxx_bridge_tx_fwd_unoffload(struct dsa_switch *ds, int port,
		struct net_device *br,
		int bridge_num)
		{
		int err;

		err = mv88e6xxx_map_virtual_bridge_to_pvt(ds, bridge_num);
		if (err) {
		dev_err(ds->dev, "failed to remap cross-chip Port VLAN: %pe\n",
		ERR_PTR(err));
		}
		}

		static int mv88e6xxx_software_reset(struct mv88e6xxx_chip *chip)
		{
		if (chip->info->ops->reset)
		@@ -3025,6 +3084,15 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
		chip->ds = ds;
		ds->slave_mii_bus = mv88e6xxx_default_mdio_bus(chip);

		/* Since virtual bridges are mapped in the PVT, the number we support
		* depends on the physical switch topology. We need to let DSA figure
		* that out and therefore we cannot set this at dsa_register_switch()
		* time.
		*/
		if (mv88e6xxx_has_pvt(chip))
		ds->num_fwd_offloading_bridges = MV88E6XXX_MAX_PVT_SWITCHES -
		ds->dst->last_switch - 1;

		mv88e6xxx_reg_lock(chip);

		if (chip->info->ops->setup_errata) {
		@@ -6128,6 +6196,8 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
		.crosschip_lag_change = mv88e6xxx_crosschip_lag_change,
		.crosschip_lag_join = mv88e6xxx_crosschip_lag_join,
		.crosschip_lag_leave = mv88e6xxx_crosschip_lag_leave,
		.port_bridge_tx_fwd_offload = mv88e6xxx_bridge_tx_fwd_offload,
		.port_bridge_tx_fwd_unoffload = mv88e6xxx_bridge_tx_fwd_unoffload,
		};

		static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip)

drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -1936,7 +1936,7 @@ static int dpaa2_switch_port_bridge_join(struct net_device *netdev,
		err = switchdev_bridge_port_offload(netdev, netdev, NULL,
		&dpaa2_switch_port_switchdev_nb,
		&dpaa2_switch_port_switchdev_blocking_nb,
		extack);
		false, extack);
		if (err)
		goto err_switchdev_offload;

drivers/net/ethernet/marvell/prestera/prestera_switchdev.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -502,7 +502,7 @@ int prestera_bridge_port_join(struct net_device *br_dev,
		}

		err = switchdev_bridge_port_offload(br_port->dev, port->dev, NULL,
		NULL, NULL, extack);
		NULL, NULL, false, extack);
		if (err)
		goto err_switchdev_offload;

drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -362,7 +362,7 @@ mlxsw_sp_bridge_port_create(struct mlxsw_sp_bridge_device *bridge_device,
		bridge_port->ref_count = 1;

		err = switchdev_bridge_port_offload(brport_dev, mlxsw_sp_port->dev,
		NULL, NULL, NULL, extack);
		NULL, NULL, NULL, false, extack);
		if (err)
		goto err_switchdev_offload;

drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -113,7 +113,7 @@ static int sparx5_port_bridge_join(struct sparx5_port *port,
		set_bit(port->portno, sparx5->bridge_mask);

		err = switchdev_bridge_port_offload(ndev, ndev, NULL, NULL, NULL,
		extack);
		false, extack);
		if (err)
		goto err_switchdev_offload;