Commit 6b7b0c30 authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files
Daniel Borkmann says:

====================
bpf-next 2021-10-02

We've added 85 non-merge commits during the last 15 day(s) which contain
a total of 132 files changed, 13779 insertions(+), 6724 deletions(-).

The main changes are:

1) Massive update on test_bpf.ko coverage for JITs as preparatory work for
   an upcoming MIPS eBPF JIT, from Johan Almbladh.

2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool,
   with driver support for i40e and ice from Magnus Karlsson.

3) Add legacy uprobe support to libbpf to complement recently merged legacy
   kprobe support, from Andrii Nakryiko.

4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky.

5) Support saving the register state in verifier when spilling <8byte bounded
   scalar to the stack, from Martin Lau.

6) Add libbpf opt-in for stricter BPF program section name handling as part
   of libbpf 1.0 effort, from Andrii Nakryiko.

7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov.

8) Fix skel_internal.h to propagate errno if the loader indicates an internal
   error, from Kumar Kartikeya Dwivedi.

9) Fix build warnings with -Wcast-function-type so that the option can later
   be enabled by default for the kernel, from Kees Cook.

10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it
    otherwise errors out when encountering them, from Toke Høiland-Jørgensen.

11) Teach libbpf to recognize specialized maps (such as for perf RB) and
    internally remove BTF type IDs when creating them, from Hengqi Chen.

12) Various fixes and improvements to BPF selftests.
====================

Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents 20ab39d1 d636c8da
Loading
Loading
Loading
Loading
+92 −0
Original line number Diff line number Diff line
=============
BPF licensing
=============

Background
==========

* Classic BPF was BSD licensed

"BPF" was originally introduced as BSD Packet Filter in
http://www.tcpdump.org/papers/bpf-usenix93.pdf. The corresponding instruction
set and its implementation came from BSD with BSD license. That original
instruction set is now known as "classic BPF".

However an instruction set is a specification for machine-language interaction,
similar to a programming language.  It is not a code. Therefore, the
application of a BSD license may be misleading in a certain context, as the
instruction set may enjoy no copyright protection.

* eBPF (extended BPF) instruction set continues to be BSD

In 2014, the classic BPF instruction set was significantly extended. We
typically refer to this instruction set as eBPF to disambiguate it from cBPF.
The eBPF instruction set is still BSD licensed.

Implementations of eBPF
=======================

Using the eBPF instruction set requires implementing code in both kernel space
and user space.

In Linux Kernel
---------------

The reference implementations of the eBPF interpreter and various just-in-time
compilers are part of Linux and are GPLv2 licensed. The implementation of
eBPF helper functions is also GPLv2 licensed. Interpreters, JITs, helpers,
and verifiers are called eBPF runtime.

In User Space
-------------

There are also implementations of eBPF runtime (interpreter, JITs, helper
functions) under
Apache2 (https://github.com/iovisor/ubpf),
MIT (https://github.com/qmonnet/rbpf), and
BSD (https://github.com/DPDK/dpdk/blob/main/lib/librte_bpf).

In HW
-----

The HW can choose to execute eBPF instruction natively and provide eBPF runtime
in HW or via the use of implementing firmware with a proprietary license.

In other operating systems
--------------------------

Other kernels or user space implementations of eBPF instruction set and runtime
can have proprietary licenses.

Using BPF programs in the Linux kernel
======================================

Linux Kernel (while being GPLv2) allows linking of proprietary kernel modules
under these rules:
Documentation/process/license-rules.rst

When a kernel module is loaded, the linux kernel checks which functions it
intends to use. If any function is marked as "GPL only," the corresponding
module or program has to have GPL compatible license.

Loading BPF program into the Linux kernel is similar to loading a kernel
module. BPF is loaded at run time and not statically linked to the Linux
kernel. BPF program loading follows the same license checking rules as kernel
modules. BPF programs can be proprietary if they don't use "GPL only" BPF
helper functions.

Further, some BPF program types - Linux Security Modules (LSM) and TCP
Congestion Control (struct_ops), as of Aug 2021 - are required to be GPL
compatible even if they don't use "GPL only" helper functions directly. The
registration step of LSM and TCP congestion control modules of the Linux
kernel is done through EXPORT_SYMBOL_GPL kernel functions. In that sense LSM
and struct_ops BPF programs are implicitly calling "GPL only" functions.
The same restriction applies to BPF programs that call kernel functions
directly via unstable interface also known as "kfunc".

Packaging BPF programs with user space applications
====================================================

Generally, proprietary-licensed applications and GPL licensed BPF programs
written for the Linux kernel in the same package can co-exist because they are
separate executable processes. This applies to both cBPF and eBPF programs.
+9 −0
Original line number Diff line number Diff line
@@ -82,6 +82,15 @@ Testing and debugging BPF
   s390


Licensing
=========

.. toctree::
   :maxdepth: 1

   bpf_licensing


Other
=====

+25 −27
Original line number Diff line number Diff line
@@ -193,42 +193,40 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
{
	u16 ntu = rx_ring->next_to_use;
	union i40e_rx_desc *rx_desc;
	struct xdp_buff **bi, *xdp;
	struct xdp_buff **xdp;
	u32 nb_buffs, i;
	dma_addr_t dma;
	bool ok = true;

	rx_desc = I40E_RX_DESC(rx_ring, ntu);
	bi = i40e_rx_bi(rx_ring, ntu);
	do {
		xdp = xsk_buff_alloc(rx_ring->xsk_pool);
		if (!xdp) {
			ok = false;
			goto no_buffers;
		}
		*bi = xdp;
		dma = xsk_buff_xdp_get_dma(xdp);
	xdp = i40e_rx_bi(rx_ring, ntu);

	nb_buffs = min_t(u16, count, rx_ring->count - ntu);
	nb_buffs = xsk_buff_alloc_batch(rx_ring->xsk_pool, xdp, nb_buffs);
	if (!nb_buffs)
		return false;

	i = nb_buffs;
	while (i--) {
		dma = xsk_buff_xdp_get_dma(*xdp);
		rx_desc->read.pkt_addr = cpu_to_le64(dma);
		rx_desc->read.hdr_addr = 0;

		rx_desc++;
		bi++;
		ntu++;
		xdp++;
	}

		if (unlikely(ntu == rx_ring->count)) {
	ntu += nb_buffs;
	if (ntu == rx_ring->count) {
		rx_desc = I40E_RX_DESC(rx_ring, 0);
			bi = i40e_rx_bi(rx_ring, 0);
		xdp = i40e_rx_bi(rx_ring, 0);
		ntu = 0;
	}
	} while (--count);

no_buffers:
	if (rx_ring->next_to_use != ntu) {
	/* clear the status bits for the next_to_use descriptor */
	rx_desc->wb.qword1.status_error_len = 0;
	i40e_release_rx_desc(rx_ring, ntu);
	}

	return ok;
	return count == nb_buffs ? true : false;
}

/**
@@ -365,7 +363,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
			break;

		bi = *i40e_rx_bi(rx_ring, next_to_clean);
		bi->data_end = bi->data + size;
		xsk_buff_set_size(bi, size);
		xsk_buff_dma_sync_for_cpu(bi, rx_ring->xsk_pool);

		xdp_res = i40e_run_xdp_zc(rx_ring, bi);
+5 −11
Original line number Diff line number Diff line
@@ -164,18 +164,11 @@ struct ice_tx_offload_params {
};

struct ice_rx_buf {
	union {
		struct {
	dma_addr_t dma;
	struct page *page;
	unsigned int page_offset;
	u16 pagecnt_bias;
};
		struct {
			struct xdp_buff *xdp;
		};
	};
};

struct ice_q_stats {
	u64 pkts;
@@ -270,6 +263,7 @@ struct ice_ring {
	union {
		struct ice_tx_buf *tx_buf;
		struct ice_rx_buf *rx_buf;
		struct xdp_buff **xdp_buf;
	};
	/* CL2 - 2nd cacheline starts here */
	u16 q_index;			/* Queue number of ring */
+43 −49
Original line number Diff line number Diff line
@@ -364,45 +364,39 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count)
{
	union ice_32b_rx_flex_desc *rx_desc;
	u16 ntu = rx_ring->next_to_use;
	struct ice_rx_buf *rx_buf;
	bool ok = true;
	struct xdp_buff **xdp;
	u32 nb_buffs, i;
	dma_addr_t dma;

	if (!count)
		return true;

	rx_desc = ICE_RX_DESC(rx_ring, ntu);
	rx_buf = &rx_ring->rx_buf[ntu];
	xdp = &rx_ring->xdp_buf[ntu];

	do {
		rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool);
		if (!rx_buf->xdp) {
			ok = false;
			break;
		}
	nb_buffs = min_t(u16, count, rx_ring->count - ntu);
	nb_buffs = xsk_buff_alloc_batch(rx_ring->xsk_pool, xdp, nb_buffs);
	if (!nb_buffs)
		return false;

		dma = xsk_buff_xdp_get_dma(rx_buf->xdp);
	i = nb_buffs;
	while (i--) {
		dma = xsk_buff_xdp_get_dma(*xdp);
		rx_desc->read.pkt_addr = cpu_to_le64(dma);
		rx_desc->wb.status_error0 = 0;

		rx_desc++;
		rx_buf++;
		ntu++;
		xdp++;
	}

		if (unlikely(ntu == rx_ring->count)) {
	ntu += nb_buffs;
	if (ntu == rx_ring->count) {
		rx_desc = ICE_RX_DESC(rx_ring, 0);
			rx_buf = rx_ring->rx_buf;
		xdp = rx_ring->xdp_buf;
		ntu = 0;
	}
	} while (--count);

	if (rx_ring->next_to_use != ntu) {
	/* clear the status bits for the next_to_use descriptor */
	rx_desc->wb.status_error0 = 0;
	ice_release_rx_desc(rx_ring, ntu);
	}

	return ok;
	return count == nb_buffs ? true : false;
}

/**
@@ -421,19 +415,19 @@ static void ice_bump_ntc(struct ice_ring *rx_ring)
/**
 * ice_construct_skb_zc - Create an sk_buff from zero-copy buffer
 * @rx_ring: Rx ring
 * @rx_buf: zero-copy Rx buffer
 * @xdp_arr: Pointer to the SW ring of xdp_buff pointers
 *
 * This function allocates a new skb from a zero-copy Rx buffer.
 *
 * Returns the skb on success, NULL on failure.
 */
static struct sk_buff *
ice_construct_skb_zc(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
ice_construct_skb_zc(struct ice_ring *rx_ring, struct xdp_buff **xdp_arr)
{
	unsigned int metasize = rx_buf->xdp->data - rx_buf->xdp->data_meta;
	unsigned int datasize = rx_buf->xdp->data_end - rx_buf->xdp->data;
	unsigned int datasize_hard = rx_buf->xdp->data_end -
				     rx_buf->xdp->data_hard_start;
	struct xdp_buff *xdp = *xdp_arr;
	unsigned int metasize = xdp->data - xdp->data_meta;
	unsigned int datasize = xdp->data_end - xdp->data;
	unsigned int datasize_hard = xdp->data_end - xdp->data_hard_start;
	struct sk_buff *skb;

	skb = __napi_alloc_skb(&rx_ring->q_vector->napi, datasize_hard,
@@ -441,13 +435,13 @@ ice_construct_skb_zc(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
	if (unlikely(!skb))
		return NULL;

	skb_reserve(skb, rx_buf->xdp->data - rx_buf->xdp->data_hard_start);
	memcpy(__skb_put(skb, datasize), rx_buf->xdp->data, datasize);
	skb_reserve(skb, xdp->data - xdp->data_hard_start);
	memcpy(__skb_put(skb, datasize), xdp->data, datasize);
	if (metasize)
		skb_metadata_set(skb, metasize);

	xsk_buff_free(rx_buf->xdp);
	rx_buf->xdp = NULL;
	xsk_buff_free(xdp);
	*xdp_arr = NULL;
	return skb;
}

@@ -521,7 +515,7 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
	while (likely(total_rx_packets < (unsigned int)budget)) {
		union ice_32b_rx_flex_desc *rx_desc;
		unsigned int size, xdp_res = 0;
		struct ice_rx_buf *rx_buf;
		struct xdp_buff **xdp;
		struct sk_buff *skb;
		u16 stat_err_bits;
		u16 vlan_tag = 0;
@@ -544,18 +538,18 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
		if (!size)
			break;

		rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean];
		rx_buf->xdp->data_end = rx_buf->xdp->data + size;
		xsk_buff_dma_sync_for_cpu(rx_buf->xdp, rx_ring->xsk_pool);
		xdp = &rx_ring->xdp_buf[rx_ring->next_to_clean];
		xsk_buff_set_size(*xdp, size);
		xsk_buff_dma_sync_for_cpu(*xdp, rx_ring->xsk_pool);

		xdp_res = ice_run_xdp_zc(rx_ring, rx_buf->xdp);
		xdp_res = ice_run_xdp_zc(rx_ring, *xdp);
		if (xdp_res) {
			if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))
				xdp_xmit |= xdp_res;
			else
				xsk_buff_free(rx_buf->xdp);
				xsk_buff_free(*xdp);

			rx_buf->xdp = NULL;
			*xdp = NULL;
			total_rx_bytes += size;
			total_rx_packets++;
			cleaned_count++;
@@ -565,7 +559,7 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
		}

		/* XDP_PASS path */
		skb = ice_construct_skb_zc(rx_ring, rx_buf);
		skb = ice_construct_skb_zc(rx_ring, xdp);
		if (!skb) {
			rx_ring->rx_stats.alloc_buf_failed++;
			break;
@@ -813,12 +807,12 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring)
	u16 i;

	for (i = 0; i < rx_ring->count; i++) {
		struct ice_rx_buf *rx_buf = &rx_ring->rx_buf[i];
		struct xdp_buff **xdp = &rx_ring->xdp_buf[i];

		if (!rx_buf->xdp)
		if (!xdp)
			continue;

		rx_buf->xdp = NULL;
		*xdp = NULL;
	}
}

Loading