Commit 4bfe186d authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge branch 'for-mingo' of...

Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

 into core/rcu

Pull RCU updates from Paul E. McKenney:

  - Documentation updates.

  - Changes permitting use of call_rcu() and friends very early in
    boot, for example, before rcu_init() is invoked.

  - Miscellaneous fixes.

  - Add in-kernel API to enable and disable expediting of normal RCU
    grace periods.

  - Improve RCU's handling of (hotplug-) outgoing CPUs.

    Note: ARM support is lagging a bit here, and these improved
    diagnostics might generate (harmless) splats.

  - NO_HZ_FULL_SYSIDLE fixes.

  - Tiny RCU updates to make it more tiny.

Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents 3c435c1e 42528795
Loading
Loading
Loading
Loading
+23 −22
Original line number Original line Diff line number Diff line
@@ -201,11 +201,11 @@ These routines add 1 and subtract 1, respectively, from the given
atomic_t and return the new counter value after the operation is
atomic_t and return the new counter value after the operation is
performed.
performed.


Unlike the above routines, it is required that explicit memory
Unlike the above routines, it is required that these primitives
barriers are performed before and after the operation.  It must be
include explicit memory barriers that are performed before and after
done such that all memory operations before and after the atomic
the operation.  It must be done such that all memory operations before
operation calls are strongly ordered with respect to the atomic
and after the atomic operation calls are strongly ordered with respect
operation itself.
to the atomic operation itself.


For example, it should behave as if a smp_mb() call existed both
For example, it should behave as if a smp_mb() call existed both
before and after the atomic operation.
before and after the atomic operation.
@@ -233,21 +233,21 @@ These two routines increment and decrement by 1, respectively, the
given atomic counter.  They return a boolean indicating whether the
given atomic counter.  They return a boolean indicating whether the
resulting counter value was zero or not.
resulting counter value was zero or not.


It requires explicit memory barrier semantics around the operation as
Again, these primitives provide explicit memory barrier semantics around
above.
the atomic operation.


	int atomic_sub_and_test(int i, atomic_t *v);
	int atomic_sub_and_test(int i, atomic_t *v);


This is identical to atomic_dec_and_test() except that an explicit
This is identical to atomic_dec_and_test() except that an explicit
decrement is given instead of the implicit "1".  It requires explicit
decrement is given instead of the implicit "1".  This primitive must
memory barrier semantics around the operation.
provide explicit memory barrier semantics around the operation.


	int atomic_add_negative(int i, atomic_t *v);
	int atomic_add_negative(int i, atomic_t *v);


The given increment is added to the given atomic counter value.  A
The given increment is added to the given atomic counter value.  A boolean
boolean is return which indicates whether the resulting counter value
is return which indicates whether the resulting counter value is negative.
is negative.  It requires explicit memory barrier semantics around the
This primitive must provide explicit memory barrier semantics around
operation.
the operation.


Then:
Then:


@@ -257,7 +257,7 @@ This performs an atomic exchange operation on the atomic variable v, setting
the given new value.  It returns the old value that the atomic variable v had
the given new value.  It returns the old value that the atomic variable v had
just before the operation.
just before the operation.


atomic_xchg requires explicit memory barriers around the operation.
atomic_xchg must provide explicit memory barriers around the operation.


	int atomic_cmpxchg(atomic_t *v, int old, int new);
	int atomic_cmpxchg(atomic_t *v, int old, int new);


@@ -266,7 +266,7 @@ with the given old and new values. Like all atomic_xxx operations,
atomic_cmpxchg will only satisfy its atomicity semantics as long as all
atomic_cmpxchg will only satisfy its atomicity semantics as long as all
other accesses of *v are performed through atomic_xxx operations.
other accesses of *v are performed through atomic_xxx operations.


atomic_cmpxchg requires explicit memory barriers around the operation.
atomic_cmpxchg must provide explicit memory barriers around the operation.


The semantics for atomic_cmpxchg are the same as those defined for 'cas'
The semantics for atomic_cmpxchg are the same as those defined for 'cas'
below.
below.
@@ -279,8 +279,8 @@ If the atomic value v is not equal to u, this function adds a to v, and
returns non zero. If v is equal to u then it returns zero. This is done as
returns non zero. If v is equal to u then it returns zero. This is done as
an atomic operation.
an atomic operation.


atomic_add_unless requires explicit memory barriers around the operation
atomic_add_unless must provide explicit memory barriers around the
unless it fails (returns 0).
operation unless it fails (returns 0).


atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)


@@ -460,9 +460,9 @@ the return value into an int. There are other places where things
like this occur as well.
like this occur as well.


These routines, like the atomic_t counter operations returning values,
These routines, like the atomic_t counter operations returning values,
require explicit memory barrier semantics around their execution.  All
must provide explicit memory barrier semantics around their execution.
memory operations before the atomic bit operation call must be made
All memory operations before the atomic bit operation call must be
visible globally before the atomic bit operation is made visible.
made visible globally before the atomic bit operation is made visible.
Likewise, the atomic bit operation must be visible globally before any
Likewise, the atomic bit operation must be visible globally before any
subsequent memory operation is made visible.  For example:
subsequent memory operation is made visible.  For example:


@@ -536,8 +536,9 @@ except that two underscores are prefixed to the interface name.
These non-atomic variants also do not require any special memory
These non-atomic variants also do not require any special memory
barrier semantics.
barrier semantics.


The routines xchg() and cmpxchg() need the same exact memory barriers
The routines xchg() and cmpxchg() must provide the same exact
as the atomic and bit operations returning values.
memory-barrier semantics as the atomic and bit operations returning
values.


Spinlocks and rwlocks have memory barrier expectations as well.
Spinlocks and rwlocks have memory barrier expectations as well.
The rule to follow is simple:
The rule to follow is simple:
+15 −5
Original line number Original line Diff line number Diff line
@@ -2968,6 +2968,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
			Set maximum number of finished RCU callbacks to
			Set maximum number of finished RCU callbacks to
			process in one batch.
			process in one batch.


	rcutree.gp_init_delay=	[KNL]
			Set the number of jiffies to delay each step of
			RCU grace-period initialization.  This only has
			effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT is
			set.

	rcutree.rcu_fanout_leaf= [KNL]
	rcutree.rcu_fanout_leaf= [KNL]
			Increase the number of CPUs assigned to each
			Increase the number of CPUs assigned to each
			leaf rcu_node structure.  Useful for very large
			leaf rcu_node structure.  Useful for very large
@@ -2991,11 +2997,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
			value is one, and maximum value is HZ.
			value is one, and maximum value is HZ.


	rcutree.kthread_prio= 	 [KNL,BOOT]
	rcutree.kthread_prio= 	 [KNL,BOOT]
			Set the SCHED_FIFO priority of the RCU
			Set the SCHED_FIFO priority of the RCU per-CPU
			per-CPU kthreads (rcuc/N). This value is also
			kthreads (rcuc/N). This value is also used for
			used for the priority of the RCU boost threads
			the priority of the RCU boost threads (rcub/N)
			(rcub/N). Valid values are 1-99 and the default
			and for the RCU grace-period kthreads (rcu_bh,
			is 1 (the least-favored priority).
			rcu_preempt, and rcu_sched). If RCU_BOOST is
			set, valid values are 1-99 and the default is 1
			(the least-favored priority).  Otherwise, when
			RCU_BOOST is not set, valid values are 0-99 and
			the default is zero (non-realtime operation).


	rcutree.rcu_nocb_leader_stride= [KNL]
	rcutree.rcu_nocb_leader_stride= [KNL]
			Set the number of NOCB kthread groups, which
			Set the number of NOCB kthread groups, which
+21 −13
Original line number Original line Diff line number Diff line
@@ -190,20 +190,24 @@ To reduce its OS jitter, do any of the following:
		on each CPU, including cs_dbs_timer() and od_dbs_timer().
		on each CPU, including cs_dbs_timer() and od_dbs_timer().
		WARNING:  Please check your CPU specifications to
		WARNING:  Please check your CPU specifications to
		make sure that this is safe on your particular system.
		make sure that this is safe on your particular system.
	d.	It is not possible to entirely get rid of OS jitter
	d.	As of v3.18, Christoph Lameter's on-demand vmstat workers
		from vmstat_update() on CONFIG_SMP=y systems, but you
		commit prevents OS jitter due to vmstat_update() on
		can decrease its frequency by writing a large value
		CONFIG_SMP=y systems.  Before v3.18, is not possible
		to /proc/sys/vm/stat_interval.	The default value is
		to entirely get rid of the OS jitter, but you can
		HZ, for an interval of one second.  Of course, larger
		decrease its frequency by writing a large value to
		values will make your virtual-memory statistics update
		/proc/sys/vm/stat_interval.  The default value is HZ,
		more slowly.  Of course, you can also run your workload
		for an interval of one second.	Of course, larger values
		at a real-time priority, thus preempting vmstat_update(),
		will make your virtual-memory statistics update more
		slowly.  Of course, you can also run your workload at
		a real-time priority, thus preempting vmstat_update(),
		but if your workload is CPU-bound, this is a bad idea.
		but if your workload is CPU-bound, this is a bad idea.
		However, there is an RFC patch from Christoph Lameter
		However, there is an RFC patch from Christoph Lameter
		(based on an earlier one from Gilad Ben-Yossef) that
		(based on an earlier one from Gilad Ben-Yossef) that
		reduces or even eliminates vmstat overhead for some
		reduces or even eliminates vmstat overhead for some
		workloads at https://lkml.org/lkml/2013/9/4/379.
		workloads at https://lkml.org/lkml/2013/9/4/379.
	e.	If running on high-end powerpc servers, build with
	e.	Boot with "elevator=noop" to avoid workqueue use by
		the block layer.
	f.	If running on high-end powerpc servers, build with
		CONFIG_PPC_RTAS_DAEMON=n.  This prevents the RTAS
		CONFIG_PPC_RTAS_DAEMON=n.  This prevents the RTAS
		daemon from running on each CPU every second or so.
		daemon from running on each CPU every second or so.
		(This will require editing Kconfig files and will defeat
		(This will require editing Kconfig files and will defeat
@@ -211,12 +215,12 @@ To reduce its OS jitter, do any of the following:
		due to the rtas_event_scan() function.
		due to the rtas_event_scan() function.
		WARNING:  Please check your CPU specifications to
		WARNING:  Please check your CPU specifications to
		make sure that this is safe on your particular system.
		make sure that this is safe on your particular system.
	f.	If running on Cell Processor, build your kernel with
	g.	If running on Cell Processor, build your kernel with
		CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
		CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
		spu_gov_work().
		spu_gov_work().
		WARNING:  Please check your CPU specifications to
		WARNING:  Please check your CPU specifications to
		make sure that this is safe on your particular system.
		make sure that this is safe on your particular system.
	g.	If running on PowerMAC, build your kernel with
	h.	If running on PowerMAC, build your kernel with
		CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
		CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
		avoiding OS jitter from rackmeter_do_timer().
		avoiding OS jitter from rackmeter_do_timer().


@@ -258,8 +262,12 @@ Purpose: Detect software lockups on each CPU.
To reduce its OS jitter, do at least one of the following:
To reduce its OS jitter, do at least one of the following:
1.	Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
1.	Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
	kthreads from being created in the first place.
	kthreads from being created in the first place.
2.	Echo a zero to /proc/sys/kernel/watchdog to disable the
2.	Boot with "nosoftlockup=0", which will also prevent these kthreads
	from being created.  Other related watchdog and softlockup boot
	parameters may be found in Documentation/kernel-parameters.txt
	and Documentation/watchdog/watchdog-parameters.txt.
3.	Echo a zero to /proc/sys/kernel/watchdog to disable the
	watchdog timer.
	watchdog timer.
3.	Echo a large number of /proc/sys/kernel/watchdog_thresh in
4.	Echo a large number of /proc/sys/kernel/watchdog_thresh in
	order to reduce the frequency of OS jitter due to the watchdog
	order to reduce the frequency of OS jitter due to the watchdog
	timer down to a level that is acceptable for your workload.
	timer down to a level that is acceptable for your workload.
+29 −13
Original line number Original line Diff line number Diff line
@@ -592,9 +592,9 @@ See also the subsection on "Cache Coherency" for a more thorough example.
CONTROL DEPENDENCIES
CONTROL DEPENDENCIES
--------------------
--------------------


A control dependency requires a full read memory barrier, not simply a data
A load-load control dependency requires a full read memory barrier, not
dependency barrier to make it work correctly.  Consider the following bit of
simply a data dependency barrier to make it work correctly.  Consider the
code:
following bit of code:


	q = ACCESS_ONCE(a);
	q = ACCESS_ONCE(a);
	if (q) {
	if (q) {
@@ -615,14 +615,15 @@ case what's actually required is:
	}
	}


However, stores are not speculated.  This means that ordering -is- provided
However, stores are not speculated.  This means that ordering -is- provided
in the following example:
for load-store control dependencies, as in the following example:


	q = ACCESS_ONCE(a);
	q = ACCESS_ONCE(a);
	if (q) {
	if (q) {
		ACCESS_ONCE(b) = p;
		ACCESS_ONCE(b) = p;
	}
	}


Please note that ACCESS_ONCE() is not optional!  Without the
Control dependencies pair normally with other types of barriers.
That said, please note that ACCESS_ONCE() is not optional!  Without the
ACCESS_ONCE(), might combine the load from 'a' with other loads from
ACCESS_ONCE(), might combine the load from 'a' with other loads from
'a', and the store to 'b' with other stores to 'b', with possible highly
'a', and the store to 'b' with other stores to 'b', with possible highly
counterintuitive effects on ordering.
counterintuitive effects on ordering.
@@ -813,6 +814,8 @@ In summary:
      barrier() can help to preserve your control dependency.  Please
      barrier() can help to preserve your control dependency.  Please
      see the Compiler Barrier section for more information.
      see the Compiler Barrier section for more information.


  (*) Control dependencies pair normally with other types of barriers.

  (*) Control dependencies do -not- provide transitivity.  If you
  (*) Control dependencies do -not- provide transitivity.  If you
      need transitivity, use smp_mb().
      need transitivity, use smp_mb().


@@ -823,14 +826,14 @@ SMP BARRIER PAIRING
When dealing with CPU-CPU interactions, certain types of memory barrier should
When dealing with CPU-CPU interactions, certain types of memory barrier should
always be paired.  A lack of appropriate pairing is almost certainly an error.
always be paired.  A lack of appropriate pairing is almost certainly an error.


General barriers pair with each other, though they also pair with
General barriers pair with each other, though they also pair with most
most other types of barriers, albeit without transitivity.  An acquire
other types of barriers, albeit without transitivity.  An acquire barrier
barrier pairs with a release barrier, but both may also pair with other
pairs with a release barrier, but both may also pair with other barriers,
barriers, including of course general barriers.  A write barrier pairs
including of course general barriers.  A write barrier pairs with a data
with a data dependency barrier, an acquire barrier, a release barrier,
dependency barrier, a control dependency, an acquire barrier, a release
a read barrier, or a general barrier.  Similarly a read barrier or a
barrier, a read barrier, or a general barrier.  Similarly a read barrier,
data dependency barrier pairs with a write barrier, an acquire barrier,
control dependency, or a data dependency barrier pairs with a write
a release barrier, or a general barrier:
barrier, an acquire barrier, a release barrier, or a general barrier:


	CPU 1		      CPU 2
	CPU 1		      CPU 2
	===============	      ===============
	===============	      ===============
@@ -850,6 +853,19 @@ Or:
			      <data dependency barrier>
			      <data dependency barrier>
			      y = *x;
			      y = *x;


Or even:

	CPU 1		      CPU 2
	===============	      ===============================
	r1 = ACCESS_ONCE(y);
	<general barrier>
	ACCESS_ONCE(y) = 1;   if (r2 = ACCESS_ONCE(x)) {
			         <implicit control dependency>
			         ACCESS_ONCE(y) = 1;
			      }

	assert(r1 == 0 || r2 == 0);

Basically, the read barrier always has to be there, even though it can be of
Basically, the read barrier always has to be there, even though it can be of
the "weaker" type.
the "weaker" type.


+3 −7
Original line number Original line Diff line number Diff line
@@ -158,13 +158,9 @@ not come for free:
	to the need to inform kernel subsystems (such as RCU) about
	to the need to inform kernel subsystems (such as RCU) about
	the change in mode.
	the change in mode.


3.	POSIX CPU timers on adaptive-tick CPUs may miss their deadlines
3.	POSIX CPU timers prevent CPUs from entering adaptive-tick mode.
	(perhaps indefinitely) because they currently rely on
	Real-time applications needing to take actions based on CPU time
	scheduling-tick interrupts.  This will likely be fixed in
	consumption need to use other means of doing so.
	one of two ways: (1) Prevent CPUs with POSIX CPU timers from
	entering adaptive-tick mode, or (2) Use hrtimers or other
	adaptive-ticks-immune mechanism to cause the POSIX CPU timer to
	fire properly.


4.	If there are more perf events pending than the hardware can
4.	If there are more perf events pending than the hardware can
	accommodate, they are normally round-robined so as to collect
	accommodate, they are normally round-robined so as to collect
Loading