Commit 33372bc2 authored by Daniel Borkmann's avatar Daniel Borkmann
Browse files

Merge branch 'xsk-batching'

Maciej Fijalkowski says:

====================
Unfortunately, similar scalability issues that were addressed for XDP
processing in ice, exist for XDP in the zero-copy driver used by AF_XDP.
Let's resolve them in mostly the same way as we did in [0] and utilize
the Tx batching API from XSK buffer pool.

Move the array of Tx descriptors that is used with batching approach to
the XSK buffer pool. This means that future users of this API will not
have to carry the array on their own side, they can simple refer to
pool's tx_desc array.

We also improve the Rx side where we extend ice_alloc_rx_buf_zc() to
handle the ring wrap and bump Rx tail more frequently. By doing so,
Rx side is adjusted to Tx and it was needed for l2fwd scenario.

Here are the improvements of performance numbers that this set brings
measured with xdpsock app in busy poll mode for 1 and 2 core modes.
Both Tx and Rx rings were sized to 1k length and busy poll budget was
256.

----------------------------------------------------------------
     |      txonly:      |      l2fwd      |      rxdrop
----------------------------------------------------------------
1C   |       149%        |       14%       |        3%
----------------------------------------------------------------
2C   |       134%        |       20%       |        5%
----------------------------------------------------------------

Next step will be to introduce batching onto Rx side.

v5:
* collect acks
* fix typos
* correct comments showing cache line boundaries in ice_tx_ring struct
v4 - address Alexandr's review:
* new patch (2) for making sure ring size is pow(2) when attaching
  xsk socket
* don't open code ALIGN_DOWN (patch 3)
* resign from storing tx_thresh in ice_tx_ring (patch 4)
* scope variables in a better way for Tx batching (patch 7)
v3:
* drop likely() that was wrapping napi_complete_done (patch 1)
* introduce configurable Tx threshold (patch 2)
* handle ring wrap on Rx side when allocating buffers (patch 3)
* respect NAPI budget when cleaning Tx descriptors in ZC (patch 6)
v2:
* introduce new patch that resets @next_dd and @next_rs fields
* use batching API for AF_XDP Tx on ice side

  [0]: https://lore.kernel.org/bpf/20211015162908.145341-8-anthony.l.nguyen@intel.com/


====================

Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
parents 8033c6c2 59e92bfe
Loading
Loading
Loading
Loading
+0 −11
Original line number Diff line number Diff line
@@ -830,8 +830,6 @@ void i40e_free_tx_resources(struct i40e_ring *tx_ring)
	i40e_clean_tx_ring(tx_ring);
	kfree(tx_ring->tx_bi);
	tx_ring->tx_bi = NULL;
	kfree(tx_ring->xsk_descs);
	tx_ring->xsk_descs = NULL;

	if (tx_ring->desc) {
		dma_free_coherent(tx_ring->dev, tx_ring->size,
@@ -1433,13 +1431,6 @@ int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring)
	if (!tx_ring->tx_bi)
		goto err;

	if (ring_is_xdp(tx_ring)) {
		tx_ring->xsk_descs = kcalloc(I40E_MAX_NUM_DESCRIPTORS, sizeof(*tx_ring->xsk_descs),
					     GFP_KERNEL);
		if (!tx_ring->xsk_descs)
			goto err;
	}

	u64_stats_init(&tx_ring->syncp);

	/* round up to nearest 4K */
@@ -1463,8 +1454,6 @@ int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring)
	return 0;

err:
	kfree(tx_ring->xsk_descs);
	tx_ring->xsk_descs = NULL;
	kfree(tx_ring->tx_bi);
	tx_ring->tx_bi = NULL;
	return -ENOMEM;
+0 −1
Original line number Diff line number Diff line
@@ -390,7 +390,6 @@ struct i40e_ring {
	u16 rx_offset;
	struct xdp_rxq_info xdp_rxq;
	struct xsk_buff_pool *xsk_pool;
	struct xdp_desc *xsk_descs;      /* For storing descriptors in the AF_XDP ZC path */
} ____cacheline_internodealigned_in_smp;

static inline bool ring_uses_build_skb(struct i40e_ring *ring)
+2 −2
Original line number Diff line number Diff line
@@ -467,11 +467,11 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
 **/
static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
{
	struct xdp_desc *descs = xdp_ring->xsk_descs;
	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
	u32 nb_pkts, nb_processed = 0;
	unsigned int total_bytes = 0;

	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, descs, budget);
	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
	if (!nb_pkts)
		return true;

+2 −0
Original line number Diff line number Diff line
@@ -2803,6 +2803,8 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring,
		/* clone ring and setup updated count */
		xdp_rings[i] = *vsi->xdp_rings[i];
		xdp_rings[i].count = new_tx_cnt;
		xdp_rings[i].next_dd = ICE_RING_QUARTER(&xdp_rings[i]) - 1;
		xdp_rings[i].next_rs = ICE_RING_QUARTER(&xdp_rings[i]) - 1;
		xdp_rings[i].desc = NULL;
		xdp_rings[i].tx_buf = NULL;
		err = ice_setup_tx_ring(&xdp_rings[i]);
+2 −2
Original line number Diff line number Diff line
@@ -2495,10 +2495,10 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
		xdp_ring->reg_idx = vsi->txq_map[xdp_q_idx];
		xdp_ring->vsi = vsi;
		xdp_ring->netdev = NULL;
		xdp_ring->next_dd = ICE_TX_THRESH - 1;
		xdp_ring->next_rs = ICE_TX_THRESH - 1;
		xdp_ring->dev = dev;
		xdp_ring->count = vsi->num_tx_desc;
		xdp_ring->next_dd = ICE_RING_QUARTER(xdp_ring) - 1;
		xdp_ring->next_rs = ICE_RING_QUARTER(xdp_ring) - 1;
		WRITE_ONCE(vsi->xdp_rings[i], xdp_ring);
		if (ice_setup_tx_ring(xdp_ring))
			goto free_xdp_rings;
Loading