sched: Comment affine_move_task() (c777d847) · Commits · EulixOS / Software / Kernel

kernel/sched/core.c

+79 −2

Original line number	Diff line number	Diff line
		@@ -2076,7 +2076,75 @@ void do_set_cpus_allowed(struct task_struct p, const struct cpumask new_mask)
		}

		/*
		* This function is wildly self concurrent, consider at least 3 times.
		* This function is wildly self concurrent; here be dragons.
		*
		*
		* When given a valid mask, __set_cpus_allowed_ptr() must block until the
		* designated task is enqueued on an allowed CPU. If that task is currently
		* running, we have to kick it out using the CPU stopper.
		*
		* Migrate-Disable comes along and tramples all over our nice sandcastle.
		* Consider:
		*
		* Initial conditions: P0->cpus_mask = [0, 1]
		*
		* P0@CPU0 P1
		*
		* migrate_disable();
		* <preempted>
		* set_cpus_allowed_ptr(P0, [1]);
		*
		* P1 cannot return from this set_cpus_allowed_ptr() call until P0 executes
		* its outermost migrate_enable() (i.e. it exits its Migrate-Disable region).
		* This means we need the following scheme:
		*
		* P0@CPU0 P1
		*
		* migrate_disable();
		* <preempted>
		* set_cpus_allowed_ptr(P0, [1]);
		* <blocks>
		* <resumes>
		* migrate_enable();
		* __set_cpus_allowed_ptr();
		* <wakes local stopper>
		* `--> <woken on migration completion>
		*
		* Now the fun stuff: there may be several P1-like tasks, i.e. multiple
		* concurrent set_cpus_allowed_ptr(P0, [*]) calls. CPU affinity changes of any
		* task p are serialized by p->pi_lock, which we can leverage: the one that
		* should come into effect at the end of the Migrate-Disable region is the last
		* one. This means we only need to track a single cpumask (i.e. p->cpus_mask),
		* but we still need to properly signal those waiting tasks at the appropriate
		* moment.
		*
		* This is implemented using struct set_affinity_pending. The first
		* __set_cpus_allowed_ptr() caller within a given Migrate-Disable region will
		* setup an instance of that struct and install it on the targeted task_struct.
		* Any and all further callers will reuse that instance. Those then wait for
		* a completion signaled at the tail of the CPU stopper callback (1), triggered
		* on the end of the Migrate-Disable region (i.e. outermost migrate_enable()).
		*
		*
		* (1) In the cases covered above. There is one more where the completion is
		* signaled within affine_move_task() itself: when a subsequent affinity request
		* cancels the need for an active migration. Consider:
		*
		* Initial conditions: P0->cpus_mask = [0, 1]
		*
		* P0@CPU0 P1 P2
		*
		* migrate_disable();
		* <preempted>
		* set_cpus_allowed_ptr(P0, [1]);
		* <blocks>
		* set_cpus_allowed_ptr(P0, [0, 1]);
		* <signal completion>
		* <awakes>
		*
		* Note that the above is safe vs a concurrent migrate_enable(), as any
		* pending affinity completion is preceded by an uninstallation of
		* p->migration_pending done with p->pi_lock held.
		*/
		static int affine_move_task(struct rq rq, struct task_struct p, struct rq_flags *rf,
		int dest_cpu, unsigned int flags)
		@@ -2120,6 +2188,7 @@ static int affine_move_task(struct rq rq, struct task_struct p, struct rq_flag
		if (!(flags & SCA_MIGRATE_ENABLE)) {
		/* serialized by p->pi_lock */
		if (!p->migration_pending) {
		/* Install the request */
		refcount_set(&my_pending.refs, 1);
		init_completion(&my_pending.done);
		p->migration_pending = &my_pending;
		@@ -2165,7 +2234,11 @@ static int affine_move_task(struct rq rq, struct task_struct p, struct rq_flag
		}

		if (task_running(rq, p) \|\| p->state == TASK_WAKING) {

		/*
		* Lessen races (and headaches) by delegating
		* is_migration_disabled(p) checks to the stopper, which will
		* run on the same CPU as said p.
		*/
		task_rq_unlock(rq, p, rf);
		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);

		@@ -2190,6 +2263,10 @@ static int affine_move_task(struct rq rq, struct task_struct p, struct rq_flag
		if (refcount_dec_and_test(&pending->refs))
		wake_up_var(&pending->refs);

		/*
		* Block the original owner of &pending until all subsequent callers
		* have seen the completion and decremented the refcount
		*/
		wait_var_event(&my_pending.refs, !refcount_read(&my_pending.refs));

		return 0;