Commit 890f2420 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull RCU updates from Paul McKenney:

 - Documentation updates.

   This is the first in a series from an ongoing review of the RCU
   documentation. "Why are people thinking -that- about RCU? Oh. Because
   that is an entirely reasonable interpretation of its documentation."

 - Miscellaneous fixes.

 - Improved memory allocation and heuristics.

 - Improve rcu_nocbs diagnostic output.

 - Add full-sized polled RCU grace period state values.

   These are the same size as an rcu_head structure, which is double
   that of the traditional unsigned long state values that may still be
   obtained from et_state_synchronize_rcu(). The added size avoids
   missing overlapping grace periods. This benefit is that call_rcu()
   can be replaced by polling, which can be attractive in situations
   where RCU-protected data is aged out of memory.

   Early in the series, the size of this state value is three unsigned
   longs. Later in the series, the fastpaths in synchronize_rcu() and
   synchronize_rcu_expedited() are reworked to permit the full state to
   be represented by only two unsigned longs. This reworking slows these
   two functions down in SMP kernels running either on single-CPU
   systems or on systems with all but one CPU offlined, but this should
   not be a significant problem. And if it somehow becomes a problem in
   some yet-as-unforeseen situations, three-value state values can be
   provided for only those situations.

   Finally, a pair of functions named same_state_synchronize_rcu() and
   same_state_synchronize_rcu_full() allow grace-period state values to
   be compared for equality. This permits users to maintain lists of
   data structures having the same state value, removing the need for
   per-data-structure grace-period state values, thus decreasing memory
   footprint.

 - Polled SRCU grace-period updates, including adding tests to
   rcutorture and reducing the incidence of Tiny SRCU grace-period-state
   counter wrap.

 - Improve Tasks RCU diagnostics and quiescent-state detection.

* tag 'rcu.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (55 commits)
  rcutorture: Use the barrier operation specified by cur_ops
  rcu-tasks: Make RCU Tasks Trace check for userspace execution
  rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states
  rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE()
  srcu: Make Tiny SRCU use full-sized grace-period counters
  srcu: Make Tiny SRCU poll_state_synchronize_srcu() more precise
  srcu: Add GP and maximum requested GP to Tiny SRCU rcutorture output
  rcutorture: Make "srcud" option also test polled grace-period API
  rcutorture: Limit read-side polling-API testing
  rcu: Add functions to compare grace-period state values
  rcutorture: Expand rcu_torture_write_types() first "if" statement
  rcutorture: Use 1-suffixed variable in rcu_torture_write_types() check
  rcu: Make synchronize_rcu() fastpath update only boot-CPU counters
  rcutorture: Adjust rcu_poll_need_2gp() for rcu_gp_oldstate field removal
  rcu: Remove ->rgos_polled field from rcu_gp_oldstate structure
  rcu: Make synchronize_rcu_expedited() fast path update .expedited_sequence
  rcu: Remove expedited grace-period fast-path forward-progress helper
  rcu: Make synchronize_rcu() fast path update ->gp_seq counters
  rcu-tasks: Remove grace-period fast-path rcu-tasks helper
  rcu: Set rcu_data structures' initial ->gpwrap value to true
  ...
parents b8fb65e1 5c0ec490
Loading
Loading
Loading
Loading
+12 −3
Original line number Diff line number Diff line
@@ -66,8 +66,13 @@ over a rather long period of time, but improvements are always welcome!
	As a rough rule of thumb, any dereference of an RCU-protected
	pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
	rcu_read_lock_sched(), or by the appropriate update-side lock.
	Disabling of preemption can serve as rcu_read_lock_sched(), but
	is less readable and prevents lockdep from detecting locking issues.
	Explicit disabling of preemption (preempt_disable(), for example)
	can serve as rcu_read_lock_sched(), but is less readable and
	prevents lockdep from detecting locking issues.

	Please not that you *cannot* rely on code known to be built
	only in non-preemptible kernels.  Such code can and will break,
	especially in kernels built with CONFIG_PREEMPT_COUNT=y.

	Letting RCU-protected pointers "leak" out of an RCU read-side
	critical section is every bit as bad as letting them leak out
@@ -185,6 +190,9 @@ over a rather long period of time, but improvements are always welcome!

5.	If call_rcu() or call_srcu() is used, the callback function will
	be called from softirq context.  In particular, it cannot block.
	If you need the callback to block, run that code in a workqueue
	handler scheduled from the callback.  The queue_rcu_work()
	function does this for you in the case of call_rcu().

6.	Since synchronize_rcu() can block, it cannot be called
	from any sort of irq context.  The same rule applies
@@ -297,7 +305,8 @@ over a rather long period of time, but improvements are always welcome!
		the machine.

	d.	Periodically invoke synchronize_rcu(), permitting a limited
		number of updates per grace period.
		number of updates per grace period.  Better yet, periodically
		invoke rcu_barrier() to wait for all outstanding callbacks.

	The same cautions apply to call_srcu() and kfree_rcu().

+10 −4
Original line number Diff line number Diff line
@@ -128,10 +128,16 @@ Follow these rules to keep your RCU code working properly:
		This sort of comparison occurs frequently when scanning
		RCU-protected circular linked lists.

		Note that if checks for being within an RCU read-side
		critical section are not required and the pointer is never
		dereferenced, rcu_access_pointer() should be used in place
		of rcu_dereference().
		Note that if the pointer comparison is done outside
		of an RCU read-side critical section, and the pointer
		is never dereferenced, rcu_access_pointer() should be
		used in place of rcu_dereference().  In most cases,
		it is best to avoid accidental dereferences by testing
		the rcu_access_pointer() return value directly, without
		assigning it to a variable.

		Within an RCU read-side critical section, there is little
		reason to use rcu_access_pointer().

	-	The comparison is against a pointer that references memory
		that was initialized "a long time ago."  The reason
+30 −17
Original line number Diff line number Diff line
@@ -6,13 +6,15 @@ What is RCU? -- "Read, Copy, Update"
Please note that the "What is RCU?" LWN series is an excellent place
to start learning about RCU:

| 1.	What is RCU, Fundamentally?  http://lwn.net/Articles/262464/
| 2.	What is RCU? Part 2: Usage   http://lwn.net/Articles/263130/
| 3.	RCU part 3: the RCU API      http://lwn.net/Articles/264090/
| 4.	The RCU API, 2010 Edition    http://lwn.net/Articles/418853/
| 	2010 Big API Table           http://lwn.net/Articles/419086/
| 5.	The RCU API, 2014 Edition    http://lwn.net/Articles/609904/
|	2014 Big API Table           http://lwn.net/Articles/609973/
| 1.	What is RCU, Fundamentally?  https://lwn.net/Articles/262464/
| 2.	What is RCU? Part 2: Usage   https://lwn.net/Articles/263130/
| 3.	RCU part 3: the RCU API      https://lwn.net/Articles/264090/
| 4.	The RCU API, 2010 Edition    https://lwn.net/Articles/418853/
| 	2010 Big API Table           https://lwn.net/Articles/419086/
| 5.	The RCU API, 2014 Edition    https://lwn.net/Articles/609904/
|	2014 Big API Table           https://lwn.net/Articles/609973/
| 6.	The RCU API, 2019 Edition    https://lwn.net/Articles/777036/
|	2019 Big API Table           https://lwn.net/Articles/777165/


What is RCU?
@@ -915,13 +917,18 @@ which an RCU reference is held include:
The understanding that RCU provides a reference that only prevents a
change of type is particularly visible with objects allocated from a
slab cache marked ``SLAB_TYPESAFE_BY_RCU``.  RCU operations may yield a
reference to an object from such a cache that has been concurrently
freed and the memory reallocated to a completely different object,
though of the same type.  In this case RCU doesn't even protect the
identity of the object from changing, only its type.  So the object
found may not be the one expected, but it will be one where it is safe
to take a reference or spinlock and then confirm that the identity
matches the expectations.
reference to an object from such a cache that has been concurrently freed
and the memory reallocated to a completely different object, though of
the same type.  In this case RCU doesn't even protect the identity of the
object from changing, only its type.  So the object found may not be the
one expected, but it will be one where it is safe to take a reference
(and then potentially acquiring a spinlock), allowing subsequent code
to check whether the identity matches expectations.  It is tempting
to simply acquire the spinlock without first taking the reference, but
unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
initialized after each and every call to kmem_cache_alloc(), which renders
reference-free spinlock acquisition completely unsafe.  Therefore, when
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.

With traditional reference counting -- such as that implemented by the
kref library in Linux -- there is typically code that runs when the last
@@ -1057,14 +1064,20 @@ SRCU: Initialization/cleanup::
	init_srcu_struct
	cleanup_srcu_struct

All: lockdep-checked RCU-protected pointer access::
All: lockdep-checked RCU utility APIs::

	rcu_access_pointer
	rcu_dereference_raw
	RCU_LOCKDEP_WARN
	rcu_sleep_check
	RCU_NONIDLE

All: Unchecked RCU-protected pointer access::

	rcu_dereference_raw

All: Unchecked RCU-protected pointer access with dereferencing prohibited::

	rcu_access_pointer

See the comment headers in the source code (or the docbook generated
from them) for more information.

+37 −5
Original line number Diff line number Diff line
@@ -42,7 +42,31 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
void rcu_barrier_tasks(void);
void rcu_barrier_tasks_rude(void);
void synchronize_rcu(void);

struct rcu_gp_oldstate;
unsigned long get_completed_synchronize_rcu(void);
void get_completed_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);

// Maximum number of unsigned long values corresponding to
// not-yet-completed RCU grace periods.
#define NUM_ACTIVE_RCU_POLL_OLDSTATE 2

/**
 * same_state_synchronize_rcu - Are two old-state values identical?
 * @oldstate1: First old-state value.
 * @oldstate2: Second old-state value.
 *
 * The two old-state values must have been obtained from either
 * get_state_synchronize_rcu(), start_poll_synchronize_rcu(), or
 * get_completed_synchronize_rcu().  Returns @true if the two values are
 * identical and @false otherwise.  This allows structures whose lifetimes
 * are tracked by old-state values to push these values to a list header,
 * allowing those structures to be slightly smaller.
 */
static inline bool same_state_synchronize_rcu(unsigned long oldstate1, unsigned long oldstate2)
{
	return oldstate1 == oldstate2;
}

#ifdef CONFIG_PREEMPT_RCU

@@ -496,13 +520,21 @@ do { \
 * against NULL.  Although rcu_access_pointer() may also be used in cases
 * where update-side locks prevent the value of the pointer from changing,
 * you should instead use rcu_dereference_protected() for this use case.
 * Within an RCU read-side critical section, there is little reason to
 * use rcu_access_pointer().
 *
 * It is usually best to test the rcu_access_pointer() return value
 * directly in order to avoid accidental dereferences being introduced
 * by later inattentive changes.  In other words, assigning the
 * rcu_access_pointer() return value to a local variable results in an
 * accident waiting to happen.
 *
 * It is also permissible to use rcu_access_pointer() when read-side
 * access to the pointer was removed at least one grace period ago, as
 * is the case in the context of the RCU callback that is freeing up
 * the data, or after a synchronize_rcu() returns.  This can be useful
 * when tearing down multi-linked structures after a grace period
 * has elapsed.
 * access to the pointer was removed at least one grace period ago, as is
 * the case in the context of the RCU callback that is freeing up the data,
 * or after a synchronize_rcu() returns.  This can be useful when tearing
 * down multi-linked structures after a grace period has elapsed.  However,
 * rcu_dereference_protected() is normally preferred for this use case.
 */
#define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)

+50 −0
Original line number Diff line number Diff line
@@ -14,25 +14,75 @@

#include <asm/param.h> /* for HZ */

struct rcu_gp_oldstate {
	unsigned long rgos_norm;
};

// Maximum number of rcu_gp_oldstate values corresponding to
// not-yet-completed RCU grace periods.
#define NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE 2

/*
 * Are the two oldstate values the same?  See the Tree RCU version for
 * docbook header.
 */
static inline bool same_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp1,
						   struct rcu_gp_oldstate *rgosp2)
{
	return rgosp1->rgos_norm == rgosp2->rgos_norm;
}

unsigned long get_state_synchronize_rcu(void);

static inline void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
	rgosp->rgos_norm = get_state_synchronize_rcu();
}

unsigned long start_poll_synchronize_rcu(void);

static inline void start_poll_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
	rgosp->rgos_norm = start_poll_synchronize_rcu();
}

bool poll_state_synchronize_rcu(unsigned long oldstate);

static inline bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
	return poll_state_synchronize_rcu(rgosp->rgos_norm);
}

static inline void cond_synchronize_rcu(unsigned long oldstate)
{
	might_sleep();
}

static inline void cond_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
	cond_synchronize_rcu(rgosp->rgos_norm);
}

static inline unsigned long start_poll_synchronize_rcu_expedited(void)
{
	return start_poll_synchronize_rcu();
}

static inline void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp)
{
	rgosp->rgos_norm = start_poll_synchronize_rcu_expedited();
}

static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
{
	cond_synchronize_rcu(oldstate);
}

static inline void cond_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp)
{
	cond_synchronize_rcu_expedited(rgosp->rgos_norm);
}

extern void rcu_barrier(void);

static inline void synchronize_rcu_expedited(void)
Loading