Commit a033907e authored by Alexei Starovoitov's avatar Alexei Starovoitov
Browse files

Merge branch 'Enable RCU semantics for task kptrs'

David Vernet says:

====================

In commit 22df776a ("tasks: Extract rcu_users out of union"), the
'refcount_t rcu_users' field was extracted out of a union with the
'struct rcu_head rcu' field. This allows us to use the field for
refcounting struct task_struct with RCU protection, as the RCU callback
no longer flips rcu_users to be nonzero after the callback is scheduled.

This patch set leverages this to do a few things:

1. Marks struct task_struct as RCU safe in the verifier, allowing
   referenced kptr tasks stored in maps to be accessed in an RCU
   read region without acquiring a reference (with just a NULL check).
2. Makes bpf_task_acquire() a KF_ACQUIRE | KF_RCU | KF_RET_NULL kfunc.
3. Removes bpf_task_kptr_get() and bpf_task_acquire_not_zero(), as
   they're now redundant with the above two changes.
4. Updates selftests and documentation accordingly.
---
Changelog:
v1: https://lore.kernel.org/all/20230331005733.406202-1-void@manifault.com/


v1 -> v2:
- Remove testcases validating nested trust inheritance. The first
  version used 'struct task_struct __rcu *parent', but because that
  field has the __rcu tag it functions differently on gcc and llvm and
  causes gcc selftests to fail. Alexei is reworking nested trust,
  anyways so let's leave it off for now (Alexei).
====================

Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parents 85850058 db9d479a
Loading
Loading
Loading
Loading
+43 −6
Original line number Diff line number Diff line
@@ -471,7 +471,7 @@ struct_ops callback arg. For example:
		struct task_struct *acquired;

		acquired = bpf_task_acquire(task);

		if (acquired)
			/*
			 * In a typical program you'd do something like store
			 * the task in a map, and the map will automatically
@@ -481,6 +481,43 @@ struct_ops callback arg. For example:
		return 0;
	}


References acquired on ``struct task_struct *`` objects are RCU protected.
Therefore, when in an RCU read region, you can obtain a pointer to a task
embedded in a map value without having to acquire a reference:

.. code-block:: c

	#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
	private(TASK) static struct task_struct *global;

	/**
	 * A trivial example showing how to access a task stored
	 * in a map using RCU.
	 */
	SEC("tp_btf/task_newtask")
	int BPF_PROG(task_rcu_read_example, struct task_struct *task, u64 clone_flags)
	{
		struct task_struct *local_copy;

		bpf_rcu_read_lock();
		local_copy = global;
		if (local_copy)
			/*
			 * We could also pass local_copy to kfuncs or helper functions here,
			 * as we're guaranteed that local_copy will be valid until we exit
			 * the RCU read region below.
			 */
			bpf_printk("Global task %s is valid", local_copy->comm);
		else
			bpf_printk("No global task found");
		bpf_rcu_read_unlock();

		/* At this point we can no longer reference local_copy. */

		return 0;
	}

----

A BPF program can also look up a task from a pid. This can be useful if the
+6 −72
Original line number Diff line number Diff line
@@ -18,6 +18,7 @@
#include <linux/pid_namespace.h>
#include <linux/poison.h>
#include <linux/proc_ns.h>
#include <linux/sched/task.h>
#include <linux/security.h>
#include <linux/btf_ids.h>
#include <linux/bpf_mem_alloc.h>
@@ -2013,73 +2014,8 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root)
 */
__bpf_kfunc struct task_struct *bpf_task_acquire(struct task_struct *p)
{
	return get_task_struct(p);
}

/**
 * bpf_task_acquire_not_zero - Acquire a reference to a rcu task object. A task
 * acquired by this kfunc which is not stored in a map as a kptr, must be
 * released by calling bpf_task_release().
 * @p: The task on which a reference is being acquired.
 */
__bpf_kfunc struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p)
{
	/* For the time being this function returns NULL, as it's not currently
	 * possible to safely acquire a reference to a task with RCU protection
	 * using get_task_struct() and put_task_struct(). This is due to the
	 * slightly odd mechanics of p->rcu_users, and how task RCU protection
	 * works.
	 *
	 * A struct task_struct is refcounted by two different refcount_t
	 * fields:
	 *
	 * 1. p->usage:     The "true" refcount field which tracks a task's
	 *		    lifetime. The task is freed as soon as this
	 *		    refcount drops to 0.
	 *
	 * 2. p->rcu_users: An "RCU users" refcount field which is statically
	 *		    initialized to 2, and is co-located in a union with
	 *		    a struct rcu_head field (p->rcu). p->rcu_users
	 *		    essentially encapsulates a single p->usage
	 *		    refcount, and when p->rcu_users goes to 0, an RCU
	 *		    callback is scheduled on the struct rcu_head which
	 *		    decrements the p->usage refcount.
	 *
	 * There are two important implications to this task refcounting logic
	 * described above. The first is that
	 * refcount_inc_not_zero(&p->rcu_users) cannot be used anywhere, as
	 * after the refcount goes to 0, the RCU callback being scheduled will
	 * cause the memory backing the refcount to again be nonzero due to the
	 * fields sharing a union. The other is that we can't rely on RCU to
	 * guarantee that a task is valid in a BPF program. This is because a
	 * task could have already transitioned to being in the TASK_DEAD
	 * state, had its rcu_users refcount go to 0, and its rcu callback
	 * invoked in which it drops its single p->usage reference. At this
	 * point the task will be freed as soon as the last p->usage reference
	 * goes to 0, without waiting for another RCU gp to elapse. The only
	 * way that a BPF program can guarantee that a task is valid is in this
	 * scenario is to hold a p->usage refcount itself.
	 *
	 * Until we're able to resolve this issue, either by pulling
	 * p->rcu_users and p->rcu out of the union, or by getting rid of
	 * p->usage and just using p->rcu_users for refcounting, we'll just
	 * return NULL here.
	 */
	return NULL;
}

/**
 * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
 * kptr acquired by this kfunc which is not subsequently stored in a map, must
 * be released by calling bpf_task_release().
 * @pp: A pointer to a task kptr on which a reference is being acquired.
 */
__bpf_kfunc struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
{
	/* We must return NULL here until we have clarity on how to properly
	 * leverage RCU for ensuring a task's lifetime. See the comment above
	 * in bpf_task_acquire_not_zero() for more details.
	 */
	if (refcount_inc_not_zero(&p->rcu_users))
		return p;
	return NULL;
}

@@ -2089,7 +2025,7 @@ __bpf_kfunc struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
 */
__bpf_kfunc void bpf_task_release(struct task_struct *p)
{
	put_task_struct(p);
	put_task_struct_rcu_user(p);
}

#ifdef CONFIG_CGROUPS
@@ -2199,7 +2135,7 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
	rcu_read_lock();
	p = find_task_by_pid_ns(pid, &init_pid_ns);
	if (p)
		bpf_task_acquire(p);
		p = bpf_task_acquire(p);
	rcu_read_unlock();

	return p;
@@ -2371,9 +2307,7 @@ BTF_ID_FLAGS(func, bpf_list_push_front)
BTF_ID_FLAGS(func, bpf_list_push_back)
BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_task_acquire_not_zero, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE)
BTF_ID_FLAGS(func, bpf_rbtree_add)
+1 −0
Original line number Diff line number Diff line
@@ -4600,6 +4600,7 @@ BTF_SET_START(rcu_protected_types)
BTF_ID(struct, prog_test_ref_kfunc)
BTF_ID(struct, cgroup)
BTF_ID(struct, bpf_cpumask)
BTF_ID(struct, task_struct)
BTF_SET_END(rcu_protected_types)
static bool rcu_protected_object(const struct btf *btf, u32 btf_id)
+2 −1
Original line number Diff line number Diff line
@@ -73,11 +73,12 @@ static const char * const success_tests[] = {
	"test_task_acquire_release_current",
	"test_task_acquire_leave_in_map",
	"test_task_xchg_release",
	"test_task_get_release",
	"test_task_map_acquire_release",
	"test_task_current_acquire_release",
	"test_task_from_pid_arg",
	"test_task_from_pid_current",
	"test_task_from_pid_invalid",
	"task_kfunc_acquire_trusted_walked",
};

void test_task_kfunc(void)
+2 −7
Original line number Diff line number Diff line
@@ -23,7 +23,7 @@ struct bpf_key *bpf_lookup_user_key(__u32 serial, __u64 flags) __ksym;
void bpf_key_put(struct bpf_key *key) __ksym;
void bpf_rcu_read_lock(void) __ksym;
void bpf_rcu_read_unlock(void) __ksym;
struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p) __ksym;
struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
void bpf_task_release(struct task_struct *p) __ksym;

SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
@@ -159,13 +159,8 @@ int task_acquire(void *ctx)
		goto out;

	/* acquire a reference which can be used outside rcu read lock region */
	gparent = bpf_task_acquire_not_zero(gparent);
	gparent = bpf_task_acquire(gparent);
	if (!gparent)
		/* Until we resolve the issues with using task->rcu_users, we
		 * expect bpf_task_acquire_not_zero() to return a NULL task.
		 * See the comment at the definition of
		 * bpf_task_acquire_not_zero() for more details.
		 */
		goto out;

	(void)bpf_task_storage_get(&map_a, gparent, 0, 0);
Loading