Commit 9a82cdc2 authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files
Daniel Borkmann says:

====================
pull-request: bpf-next 2023-04-21

We've added 71 non-merge commits during the last 8 day(s) which contain
a total of 116 files changed, 13397 insertions(+), 8896 deletions(-).

The main changes are:

1) Add a new BPF netfilter program type and minimal support to hook
   BPF programs to netfilter hooks such as prerouting or forward,
   from Florian Westphal.

2) Fix race between btf_put and btf_idr walk which caused a deadlock,
   from Alexei Starovoitov.

3) Second big batch to migrate test_verifier unit tests into test_progs
   for ease of readability and debugging, from Eduard Zingerman.

4) Add support for refcounted local kptrs to the verifier for allowing
   shared ownership, useful for adding a node to both the BPF list and
   rbtree, from Dave Marchevsky.

5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF
  selftests into libbpf-provided bpf_helpers.h header and improve
  kfunc handling, from Andrii Nakryiko.

6) Support 64-bit pointers to kfuncs needed for archs like s390x,
   from Ilya Leoshkevich.

7) Support BPF progs under getsockopt with a NULL optval,
   from Stanislav Fomichev.

8) Improve verifier u32 scalar equality checking in order to enable
   LLVM transformations which earlier had to be disabled specifically
   for BPF backend, from Yonghong Song.

9) Extend bpftool's struct_ops object loading to support links,
   from Kui-Feng Lee.

10) Add xsk selftest follow-up fixes for hugepage allocated umem,
    from Magnus Karlsson.

11) Support BPF redirects from tc BPF to ifb devices,
    from Daniel Borkmann.

12) Add BPF support for integer type when accessing variable length
    arrays, from Feng Zhou.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits)
  selftests/bpf: verifier/value_ptr_arith converted to inline assembly
  selftests/bpf: verifier/value_illegal_alu converted to inline assembly
  selftests/bpf: verifier/unpriv converted to inline assembly
  selftests/bpf: verifier/subreg converted to inline assembly
  selftests/bpf: verifier/spin_lock converted to inline assembly
  selftests/bpf: verifier/sock converted to inline assembly
  selftests/bpf: verifier/search_pruning converted to inline assembly
  selftests/bpf: verifier/runtime_jit converted to inline assembly
  selftests/bpf: verifier/regalloc converted to inline assembly
  selftests/bpf: verifier/ref_tracking converted to inline assembly
  selftests/bpf: verifier/map_ptr_mixing converted to inline assembly
  selftests/bpf: verifier/map_in_map converted to inline assembly
  selftests/bpf: verifier/lwt converted to inline assembly
  selftests/bpf: verifier/loops1 converted to inline assembly
  selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly
  selftests/bpf: verifier/direct_packet_access converted to inline assembly
  selftests/bpf: verifier/d_path converted to inline assembly
  selftests/bpf: verifier/ctx converted to inline assembly
  selftests/bpf: verifier/btf_ctx_access converted to inline assembly
  selftests/bpf: verifier/bpf_get_stack converted to inline assembly
  ...
====================

Link: https://lore.kernel.org/r/20230421211035.9111-1-daniel@iogearbox.net


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents 418a7307 4db10a82
Loading
Loading
Loading
Loading
+6 −15
Original line number Diff line number Diff line
@@ -184,16 +184,7 @@ in. All copies of the pointer being released are invalidated as a result of
invoking kfunc with this flag. KF_RELEASE kfuncs automatically receive the
protection afforded by the KF_TRUSTED_ARGS flag described below.

2.4.4 KF_KPTR_GET flag
----------------------

The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
as a pointer to kptr, safely increments the refcount of the object it points to,
and returns a reference to the user. The rest of the arguments may be normal
arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
KF_ACQUIRE and KF_RET_NULL flags.

2.4.5 KF_TRUSTED_ARGS flag
2.4.4 KF_TRUSTED_ARGS flag
--------------------------

The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
@@ -205,7 +196,7 @@ exception described below).
There are two types of pointers to kernel objects which are considered "valid":

1. Pointers which are passed as tracepoint or struct_ops callback arguments.
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
2. Pointers which were returned from a KF_ACQUIRE kfunc.

Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
@@ -232,13 +223,13 @@ In other words, you must:
2. Specify the type and name of the trusted nested field. This field must match
   the field in the original type definition exactly.

2.4.6 KF_SLEEPABLE flag
2.4.5 KF_SLEEPABLE flag
-----------------------

The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
be called by sleepable BPF programs (BPF_F_SLEEPABLE).

2.4.7 KF_DESTRUCTIVE flag
2.4.6 KF_DESTRUCTIVE flag
--------------------------

The KF_DESTRUCTIVE flag is used to indicate functions calling which is
@@ -247,7 +238,7 @@ rebooting or panicking. Due to this additional restrictions apply to these
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
added later.

2.4.8 KF_RCU flag
2.4.7 KF_RCU flag
-----------------

The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
@@ -260,7 +251,7 @@ also be KF_RET_NULL.

.. _KF_deprecated_flag:

2.4.9 KF_DEPRECATED flag
2.4.8 KF_DEPRECATED flag
------------------------

The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
+5 −0
Original line number Diff line number Diff line
@@ -2001,6 +2001,11 @@ bool bpf_jit_supports_kfunc_call(void)
	return true;
}

bool bpf_jit_supports_far_kfunc_call(void)
{
	return true;
}

int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
		       void *old_addr, void *new_addr)
{
+68 −25
Original line number Diff line number Diff line
@@ -187,6 +187,7 @@ enum btf_field_type {
	BPF_RB_NODE    = (1 << 7),
	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
				 BPF_RB_NODE | BPF_RB_ROOT,
	BPF_REFCOUNT   = (1 << 8),
};

typedef void (*btf_dtor_kfunc_t)(void *);
@@ -210,6 +211,7 @@ struct btf_field_graph_root {

struct btf_field {
	u32 offset;
	u32 size;
	enum btf_field_type type;
	union {
		struct btf_field_kptr kptr;
@@ -222,15 +224,10 @@ struct btf_record {
	u32 field_mask;
	int spin_lock_off;
	int timer_off;
	int refcount_off;
	struct btf_field fields[];
};

struct btf_field_offs {
	u32 cnt;
	u32 field_off[BTF_FIELDS_MAX];
	u8 field_sz[BTF_FIELDS_MAX];
};

struct bpf_map {
	/* The first two cachelines with read-mostly members of which some
	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -257,7 +254,6 @@ struct bpf_map {
	struct obj_cgroup *objcg;
#endif
	char name[BPF_OBJ_NAME_LEN];
	struct btf_field_offs *field_offs;
	/* The 3rd and 4th cacheline with misc members to avoid false sharing
	 * particularly with refcounting.
	 */
@@ -299,6 +295,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
		return "bpf_rb_root";
	case BPF_RB_NODE:
		return "bpf_rb_node";
	case BPF_REFCOUNT:
		return "bpf_refcount";
	default:
		WARN_ON_ONCE(1);
		return "unknown";
@@ -323,6 +321,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
		return sizeof(struct bpf_rb_root);
	case BPF_RB_NODE:
		return sizeof(struct bpf_rb_node);
	case BPF_REFCOUNT:
		return sizeof(struct bpf_refcount);
	default:
		WARN_ON_ONCE(1);
		return 0;
@@ -347,12 +347,42 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
		return __alignof__(struct bpf_rb_root);
	case BPF_RB_NODE:
		return __alignof__(struct bpf_rb_node);
	case BPF_REFCOUNT:
		return __alignof__(struct bpf_refcount);
	default:
		WARN_ON_ONCE(1);
		return 0;
	}
}

static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
{
	memset(addr, 0, field->size);

	switch (field->type) {
	case BPF_REFCOUNT:
		refcount_set((refcount_t *)addr, 1);
		break;
	case BPF_RB_NODE:
		RB_CLEAR_NODE((struct rb_node *)addr);
		break;
	case BPF_LIST_HEAD:
	case BPF_LIST_NODE:
		INIT_LIST_HEAD((struct list_head *)addr);
		break;
	case BPF_RB_ROOT:
		/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
	case BPF_SPIN_LOCK:
	case BPF_TIMER:
	case BPF_KPTR_UNREF:
	case BPF_KPTR_REF:
		break;
	default:
		WARN_ON_ONCE(1);
		return;
	}
}

static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_field_type type)
{
	if (IS_ERR_OR_NULL(rec))
@@ -360,14 +390,14 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
	return rec->field_mask & type;
}

static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
static inline void bpf_obj_init(const struct btf_record *rec, void *obj)
{
	int i;

	if (!foffs)
	if (IS_ERR_OR_NULL(rec))
		return;
	for (i = 0; i < foffs->cnt; i++)
		memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
	for (i = 0; i < rec->cnt; i++)
		bpf_obj_init_field(&rec->fields[i], obj + rec->fields[i].offset);
}

/* 'dst' must be a temporary buffer and should not point to memory that is being
@@ -379,7 +409,7 @@ static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
 */
static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
{
	bpf_obj_init(map->field_offs, dst);
	bpf_obj_init(map->record, dst);
}

/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
@@ -399,14 +429,14 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
}

/* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */
static inline void bpf_obj_memcpy(struct btf_field_offs *foffs,
static inline void bpf_obj_memcpy(struct btf_record *rec,
				  void *dst, void *src, u32 size,
				  bool long_memcpy)
{
	u32 curr_off = 0;
	int i;

	if (likely(!foffs)) {
	if (IS_ERR_OR_NULL(rec)) {
		if (long_memcpy)
			bpf_long_memcpy(dst, src, round_up(size, 8));
		else
@@ -414,49 +444,49 @@ static inline void bpf_obj_memcpy(struct btf_field_offs *foffs,
		return;
	}

	for (i = 0; i < foffs->cnt; i++) {
		u32 next_off = foffs->field_off[i];
	for (i = 0; i < rec->cnt; i++) {
		u32 next_off = rec->fields[i].offset;
		u32 sz = next_off - curr_off;

		memcpy(dst + curr_off, src + curr_off, sz);
		curr_off += foffs->field_sz[i] + sz;
		curr_off += rec->fields[i].size + sz;
	}
	memcpy(dst + curr_off, src + curr_off, size - curr_off);
}

static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
{
	bpf_obj_memcpy(map->field_offs, dst, src, map->value_size, false);
	bpf_obj_memcpy(map->record, dst, src, map->value_size, false);
}

static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src)
{
	bpf_obj_memcpy(map->field_offs, dst, src, map->value_size, true);
	bpf_obj_memcpy(map->record, dst, src, map->value_size, true);
}

static inline void bpf_obj_memzero(struct btf_field_offs *foffs, void *dst, u32 size)
static inline void bpf_obj_memzero(struct btf_record *rec, void *dst, u32 size)
{
	u32 curr_off = 0;
	int i;

	if (likely(!foffs)) {
	if (IS_ERR_OR_NULL(rec)) {
		memset(dst, 0, size);
		return;
	}

	for (i = 0; i < foffs->cnt; i++) {
		u32 next_off = foffs->field_off[i];
	for (i = 0; i < rec->cnt; i++) {
		u32 next_off = rec->fields[i].offset;
		u32 sz = next_off - curr_off;

		memset(dst + curr_off, 0, sz);
		curr_off += foffs->field_sz[i] + sz;
		curr_off += rec->fields[i].size + sz;
	}
	memset(dst + curr_off, 0, size - curr_off);
}

static inline void zero_map_value(struct bpf_map *map, void *dst)
{
	bpf_obj_memzero(map->field_offs, dst, map->value_size);
	bpf_obj_memzero(map->record, dst, map->value_size);
}

void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
@@ -2234,6 +2264,9 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog,
				const union bpf_attr *kattr,
				union bpf_attr __user *uattr);
int bpf_prog_test_run_nf(struct bpf_prog *prog,
			 const union bpf_attr *kattr,
			 union bpf_attr __user *uattr);
bool btf_ctx_access(int off, int size, enum bpf_access_type type,
		    const struct bpf_prog *prog,
		    struct bpf_insn_access_aux *info);
@@ -2295,6 +2328,9 @@ bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
const struct btf_func_model *
bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
			 const struct bpf_insn *insn);
int bpf_get_kfunc_addr(const struct bpf_prog *prog, u32 func_id,
		       u16 btf_fd_idx, u8 **func_addr);

struct bpf_core_ctx {
	struct bpf_verifier_log *log;
	const struct btf *btf;
@@ -2545,6 +2581,13 @@ bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
	return NULL;
}

static inline int
bpf_get_kfunc_addr(const struct bpf_prog *prog, u32 func_id,
		   u16 btf_fd_idx, u8 **func_addr)
{
	return -ENOTSUPP;
}

static inline bool unprivileged_ebpf_enabled(void)
{
	return false;
+4 −0
Original line number Diff line number Diff line
@@ -79,6 +79,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
#endif
BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
	      void *, void *)
#ifdef CONFIG_NETFILTER
BPF_PROG_TYPE(BPF_PROG_TYPE_NETFILTER, netfilter,
	      struct bpf_nf_ctx, struct bpf_nf_ctx)
#endif

BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
+6 −1
Original line number Diff line number Diff line
@@ -464,7 +464,12 @@ struct bpf_insn_aux_data {
		 */
		struct bpf_loop_inline_state loop_inline_state;
	};
	u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */
	union {
		/* remember the size of type passed to bpf_obj_new to rewrite R1 */
		u64 obj_new_size;
		/* remember the offset of node field within type to rewrite */
		u64 insert_off;
	};
	struct btf_struct_meta *kptr_struct_meta;
	u64 map_key_state; /* constant (32 bit) key tracking for maps */
	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
Loading