Commit 3d431677 authored May 19, 2021 by Daniel Borkmann

Merge branch 'bpf-loader-progs'

Alexei Starovoitov says:

====================
v5->v6:
- fixed issue found by bpf CI. The light skeleton generation was
doing a dry-run of loading the program where all actual sys_bpf syscalls
were replaced by calls into gen_loader. Turned out that search for valid
vmlinux_btf was not stubbed out which was causing light skeleton gen
to fail on older kernels.
- significantly reduced verbosity of gen_loader.c.
- an example trace_printk.lskel.h generated out of progs/trace_printk.c
https://gist.github.com/4ast/774ea58f8286abac6aa8e3bf3bf3b903

v4->v5:
- addressed a bunch of minor comments from Andrii.
- the main difference is that lskel is now more robust in case of errors
and a bit cleaner looking.

v3->v4:
- cleaned up closing of temporary FDs in case intermediate sys_bpf fails during
execution of loader program.
- added support for rodata in the skeleton.
- enforce bpf_prog_type_syscall to be sleepable, since it needs bpf_copy_from_user
to populate rodata map.
- converted test trace_printk to use lskel to test rodata access.
- various small bug fixes.

v2->v3: Addressed comments from Andrii and John.
- added support for setting max_entries after signature verification
and used it in ringbuf test, since ringbuf's max_entries has to be updated
after skeleton open() and before load(). See patch 20.
- bpf_btf_find_by_name_kind doesn't take btf_fd anymore.
Because of that removed attach_prog_fd from bpf_prog_desc in lskel.
Both features to be added later.
- cleaned up closing of fd==0 during loader gen by resetting fds back to -1.
- converted loader gen to use memset(&attr, cmd_specific_attr_size).
would love to see this optimization in the rest of libbpf.
- fixed memory leak during loader_gen in case of enomem.
- support for fd_array kernel feature is added in patch 9 to have
exhaustive testing across all selftests and then partially reverted
in patch 15 to keep old style map_fd patching tested as well.
- since fentry_test/fexit_tests were extended with re-attach had to add
support for per-program attach method in lskel and use it in the tests.
- cleanup closing of fds in lskel in case of partial failures.
- fixed numerous small nits.

v1->v2: Addressed comments from Al, Yonghong and Andrii.
- documented sys_close fdget/fdput requirement and non-recursion check.
- reduced internal api leaks between libbpf and bpftool.
Now bpf_object__gen_loader() is the only new libbf api with minimal fields.
- fixed light skeleton __destroy() method to munmap and close maps and progs.
- refactored bpf_btf_find_by_name_kind to return btf_id | (btf_obj_fd << 32).
- refactored use of bpf_btf_find_by_name_kind from loader prog.
- moved auto-gen like code into skel_internal.h that is used by *.lskel.h
It has minimal static inline bpf_load_and_run() method used by lskel.
- added lksel.h example in patch 15.
- replaced union bpf_map_prog_desc with struct bpf_map_desc and struct bpf_prog_desc.
- removed mark_feat_supported and added a patch to pass 'obj' into kernel_supports.
- added proper tracking of temporary FDs in loader prog and their cleanup via bpf_sys_close.
- rename gen_trace.c into gen_loader.c to better align the naming throughout.
- expanded number of available helpers in new prog type.
- added support for raw_tp attaching in lskel.
lskel supports tracing and raw_tp progs now.
It correctly loads all networking prog types too, but __attach() method is tbd.
- converted progs/test_ksyms_module.c to lskel.
- minor feedback fixes all over.

The description of V1 set is still valid:

This is a first step towards signed bpf programs and the third approach of that kind.
The first approach was to bring libbpf into the kernel as a user-mode-driver.
The second approach was to invent a new file format and let kernel execute
that format as a sequence of syscalls that create maps and load programs.
This third approach is using new type of bpf program instead of inventing file format.
1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
after months of work.

To make it work the following new concepts are introduced:
1. syscall bpf program type
A kind of bpf program that can do sys_bpf and sys_close syscalls.
It can only execute in user context.

2. FD array or FD index.
Traditionally BPF instructions are patched with FDs.
What it means that maps has to be created first and then instructions modified
which breaks signature verification if the program is signed.
Instead of patching each instruction with FD patch it with an index into array of FDs.
That makes the program signature stable if it uses maps.

3. loader program that is generated as "strace of libbpf".
When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
load BTF, create maps, populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map and single bpf program that
does those syscalls.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.

4. light skeleton
Instead of embedding the whole elf file into skeleton and using libbpf
to parse it later generate a loader program and embed it into "light skeleton".
Such skeleton can load the same set of elf files, but it doesn't need
libbpf and libelf to do that. It only needs few sys_bpf wrappers.

Future steps:
- support CO-RE in the kernel. This patch set is already too big,
so that critical feature is left for the next step.
- generate light skeleton in golang to allow such users use BTF and
all other features provided by libbpf
- generate light skeleton for kernel, so that bpf programs can be embeded
in the kernel module. The UMD usage in bpf_preload will be replaced with
such skeleton, so bpf_preload would become a standard kernel module
without user space dependency.
- finally do the signing of the loader program.

The patches are work in progress with few rough edges.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

parents fa7b83bf 1a532eb2

include/linux/bpf.h

+15 −4

Original line number	Diff line number	Diff line
		@@ -22,6 +22,7 @@
		#include <linux/sched/mm.h>
		#include <linux/slab.h>
		#include <linux/percpu-refcount.h>
		#include <linux/bpfptr.h>

		struct bpf_verifier_env;
		struct bpf_verifier_log;
		@@ -1428,7 +1429,7 @@ struct bpf_iter__bpf_map_elem {
		int bpf_iter_reg_target(const struct bpf_iter_reg *reg_info);
		void bpf_iter_unreg_target(const struct bpf_iter_reg *reg_info);
		bool bpf_iter_prog_supported(struct bpf_prog *prog);
		int bpf_iter_link_attach(const union bpf_attr attr, struct bpf_prog prog);
		int bpf_iter_link_attach(const union bpf_attr attr, bpfptr_t uattr, struct bpf_prog prog);
		int bpf_iter_new_fd(struct bpf_link *link);
		bool bpf_link_is_iter(struct bpf_link *link);
		struct bpf_prog bpf_iter_get_info(struct bpf_iter_meta meta, bool in_stop);
		@@ -1459,7 +1460,7 @@ int bpf_fd_htab_map_update_elem(struct bpf_map map, struct file map_file,
		int bpf_fd_htab_map_lookup_elem(struct bpf_map map, void key, u32 *value);

		int bpf_get_file_flag(int flags);
		int bpf_check_uarg_tail_zero(void __user *uaddr, size_t expected_size,
		int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size,
		size_t actual_size);

		/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
		@@ -1479,8 +1480,7 @@ static inline void bpf_long_memcpy(void dst, const void src, u32 size)
		}

		/* verify correctness of eBPF program */
		int bpf_check(struct bpf_prog *fp, union bpf_attr attr,
		union bpf_attr __user *uattr);
		int bpf_check(struct bpf_prog *fp, union bpf_attr attr, bpfptr_t uattr);

		#ifndef CONFIG_BPF_JIT_ALWAYS_ON
		void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
		@@ -1826,6 +1826,9 @@ static inline bool bpf_map_is_dev_bound(struct bpf_map *map)

		struct bpf_map bpf_map_offload_map_alloc(union bpf_attr attr);
		void bpf_map_offload_map_free(struct bpf_map *map);
		int bpf_prog_test_run_syscall(struct bpf_prog *prog,
		const union bpf_attr *kattr,
		union bpf_attr __user *uattr);
		#else
		static inline int bpf_prog_offload_init(struct bpf_prog *prog,
		union bpf_attr *attr)
		@@ -1851,6 +1854,13 @@ static inline struct bpf_map bpf_map_offload_map_alloc(union bpf_attr attr)
		static inline void bpf_map_offload_map_free(struct bpf_map *map)
		{
		}

		static inline int bpf_prog_test_run_syscall(struct bpf_prog *prog,
		const union bpf_attr *kattr,
		union bpf_attr __user *uattr)
		{
		return -ENOTSUPP;
		}
		#endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */

		#if defined(CONFIG_INET) && defined(CONFIG_BPF_SYSCALL)
		@@ -1964,6 +1974,7 @@ extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
		extern const struct bpf_func_proto bpf_task_storage_get_proto;
		extern const struct bpf_func_proto bpf_task_storage_delete_proto;
		extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
		extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;

		const struct bpf_func_proto *bpf_tracing_func_proto(
		enum bpf_func_id func_id, const struct bpf_prog *prog);

include/linux/bpf_types.h

+2 −0

Original line number	Diff line number	Diff line
		@@ -77,6 +77,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
		void , void )
		#endif /* CONFIG_BPF_LSM */
		#endif
		BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
		void , void )

		BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
		BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)

include/linux/bpf_verifier.h

+1 −0

Original line number	Diff line number	Diff line
		@@ -450,6 +450,7 @@ struct bpf_verifier_env {
		u32 peak_states;
		/* longest register parentage chain walked for liveness marking */
		u32 longest_mark_read_walk;
		bpfptr_t fd_array;
		};

		__printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,

include/linux/bpfptr.h

0 → 100644

+75 −0

Original line number	Diff line number	Diff line
		/* SPDX-License-Identifier: GPL-2.0-only */
		/* A pointer that can point to either kernel or userspace memory. */
		#ifndef _LINUX_BPFPTR_H
		#define _LINUX_BPFPTR_H

		#include <linux/sockptr.h>

		typedef sockptr_t bpfptr_t;

		static inline bool bpfptr_is_kernel(bpfptr_t bpfptr)
		{
		return bpfptr.is_kernel;
		}

		static inline bpfptr_t KERNEL_BPFPTR(void *p)
		{
		return (bpfptr_t) { .kernel = p, .is_kernel = true };
		}

		static inline bpfptr_t USER_BPFPTR(void __user *p)
		{
		return (bpfptr_t) { .user = p };
		}

		static inline bpfptr_t make_bpfptr(u64 addr, bool is_kernel)
		{
		if (is_kernel)
		return KERNEL_BPFPTR((void*) (uintptr_t) addr);
		else
		return USER_BPFPTR(u64_to_user_ptr(addr));
		}

		static inline bool bpfptr_is_null(bpfptr_t bpfptr)
		{
		if (bpfptr_is_kernel(bpfptr))
		return !bpfptr.kernel;
		return !bpfptr.user;
		}

		static inline void bpfptr_add(bpfptr_t *bpfptr, size_t val)
		{
		if (bpfptr_is_kernel(*bpfptr))
		bpfptr->kernel += val;
		else
		bpfptr->user += val;
		}

		static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src,
		size_t offset, size_t size)
		{
		return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size);
		}

		static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size)
		{
		return copy_from_bpfptr_offset(dst, src, 0, size);
		}

		static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset,
		const void *src, size_t size)
		{
		return copy_to_sockptr_offset((sockptr_t) dst, offset, src, size);
		}

		static inline void *memdup_bpfptr(bpfptr_t src, size_t len)
		{
		return memdup_sockptr((sockptr_t) src, len);
		}

		static inline long strncpy_from_bpfptr(char *dst, bpfptr_t src, size_t count)
		{
		return strncpy_from_sockptr(dst, (sockptr_t) src, count);
		}

		#endif /* _LINUX_BPFPTR_H */

include/linux/btf.h

+1 −1

Original line number	Diff line number	Diff line
		@@ -21,7 +21,7 @@ extern const struct file_operations btf_fops;

		void btf_get(struct btf *btf);
		void btf_put(struct btf *btf);
		int btf_new_fd(const union bpf_attr *attr);
		int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr);
		struct btf *btf_get_by_fd(int fd);
		int btf_get_info_by_fd(const struct btf *btf,
		const union bpf_attr *attr,