Merge branch 'for-mingo' of... (e525a862) · Commits · EulixOS / Software / Kernel

Documentation/memory-barriers.txt

+116 −25

Original line number	Diff line number	Diff line
		@@ -232,7 +232,7 @@ And there are a number of things that _must_ or _must_not_ be assumed:
		with memory references that are not protected by READ_ONCE() and
		WRITE_ONCE(). Without them, the compiler is within its rights to
		do all sorts of "creative" transformations, which are covered in
		the Compiler Barrier section.
		the COMPILER BARRIER section.

		(*) It _must_not_ be assumed that independent loads and stores will be issued
		in the order given. This means that for:
		@@ -555,6 +555,30 @@ between the address load and the data load:
		This enforces the occurrence of one of the two implications, and prevents the
		third possibility from arising.

		A data-dependency barrier must also order against dependent writes:

		CPU 1 CPU 2
		=============== ===============
		{ A == 1, B == 2, C = 3, P == &A, Q == &C }
		B = 4;
		<write barrier>
		WRITE_ONCE(P, &B);
		Q = READ_ONCE(P);
		<data dependency barrier>
		*Q = 5;

		The data-dependency barrier must order the read into Q with the store
		into *Q. This prohibits this outcome:

		(Q == B) && (B == 4)

		Please note that this pattern should be rare. After all, the whole point
		of dependency ordering is to -prevent- writes to the data structure, along
		with the expensive cache misses associated with those writes. This pattern
		can be used to record rare error conditions and the like, and the ordering
		prevents such records from being lost.


		[!] Note that this extremely counterintuitive situation arises most easily on
		machines with split caches, so that, for example, one cache bank processes
		even-numbered cache lines and the other bank processes odd-numbered cache
		@@ -565,21 +589,6 @@ odd-numbered bank is idle, one can see the new value of the pointer P (&B),
		but the old value of the variable B (2).


		Another example of where data dependency barriers might be required is where a
		number is read from memory and then used to calculate the index for an array
		access:

		CPU 1 CPU 2
		=============== ===============
		{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
		M[1] = 4;
		<write barrier>
		WRITE_ONCE(P, 1);
		Q = READ_ONCE(P);
		<data dependency barrier>
		D = M[Q];


		The data dependency barrier is very important to the RCU system,
		for example. See rcu_assign_pointer() and rcu_dereference() in
		include/linux/rcupdate.h. This permits the current target of an RCU'd
		@@ -800,9 +809,13 @@ In summary:
		use smp_rmb(), smp_wmb(), or, in the case of prior stores and
		later loads, smp_mb().

		(*) If both legs of the "if" statement begin with identical stores
		to the same variable, a barrier() statement is required at the
		beginning of each leg of the "if" statement.
		(*) If both legs of the "if" statement begin with identical stores to
		the same variable, then those stores must be ordered, either by
		preceding both of them with smp_mb() or by using smp_store_release()
		to carry out the stores. Please note that it is -not- sufficient
		to use barrier() at beginning of each leg of the "if" statement,
		as optimizing compilers do not necessarily respect barrier()
		in this case.

		(*) Control dependencies require at least one run-time conditional
		between the prior load and the subsequent store, and this
		@@ -814,7 +827,7 @@ In summary:
		(*) Control dependencies require that the compiler avoid reordering the
		dependency into nonexistence. Careful use of READ_ONCE() or
		atomic{,64}_read() can help to preserve your control dependency.
		Please see the Compiler Barrier section for more information.
		Please see the COMPILER BARRIER section for more information.

		(*) Control dependencies pair normally with other types of barriers.

		@@ -1257,7 +1270,7 @@ TRANSITIVITY

		Transitivity is a deeply intuitive notion about ordering that is not
		always provided by real computer systems. The following example
		demonstrates transitivity (also called "cumulativity"):
		demonstrates transitivity:

		CPU 1 CPU 2 CPU 3
		======================= ======================= =======================
		@@ -1305,8 +1318,86 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
		General barriers are therefore required to ensure that all CPUs agree
		on the combined order of CPU 1's and CPU 2's accesses.

		To reiterate, if your code requires transitivity, use general barriers
		throughout.
		General barriers provide "global transitivity", so that all CPUs will
		agree on the order of operations. In contrast, a chain of release-acquire
		pairs provides only "local transitivity", so that only those CPUs on
		the chain are guaranteed to agree on the combined order of the accesses.
		For example, switching to C code in deference to Herman Hollerith:

		int u, v, x, y, z;

		void cpu0(void)
		{
		r0 = smp_load_acquire(&x);
		WRITE_ONCE(u, 1);
		smp_store_release(&y, 1);
		}

		void cpu1(void)
		{
		r1 = smp_load_acquire(&y);
		r4 = READ_ONCE(v);
		r5 = READ_ONCE(u);
		smp_store_release(&z, 1);
		}

		void cpu2(void)
		{
		r2 = smp_load_acquire(&z);
		smp_store_release(&x, 1);
		}

		void cpu3(void)
		{
		WRITE_ONCE(v, 1);
		smp_mb();
		r3 = READ_ONCE(u);
		}

		Because cpu0(), cpu1(), and cpu2() participate in a local transitive
		chain of smp_store_release()/smp_load_acquire() pairs, the following
		outcome is prohibited:

		r0 == 1 && r1 == 1 && r2 == 1

		Furthermore, because of the release-acquire relationship between cpu0()
		and cpu1(), cpu1() must see cpu0()'s writes, so that the following
		outcome is prohibited:

		r1 == 1 && r5 == 0

		However, the transitivity of release-acquire is local to the participating
		CPUs and does not apply to cpu3(). Therefore, the following outcome
		is possible:

		r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0

		As an aside, the following outcome is also possible:

		r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 && r5 == 1

		Although cpu0(), cpu1(), and cpu2() will see their respective reads and
		writes in order, CPUs not involved in the release-acquire chain might
		well disagree on the order. This disagreement stems from the fact that
		the weak memory-barrier instructions used to implement smp_load_acquire()
		and smp_store_release() are not required to order prior stores against
		subsequent loads in all cases. This means that cpu3() can see cpu0()'s
		store to u as happening -after- cpu1()'s load from v, even though
		both cpu0() and cpu1() agree that these two operations occurred in the
		intended order.

		However, please keep in mind that smp_load_acquire() is not magic.
		In particular, it simply reads from its argument with ordering. It does
		-not- ensure that any particular value will be read. Therefore, the
		following outcome is possible:

		r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0

		Note that this outcome can happen even on a mythical sequentially
		consistent system where nothing is ever reordered.

		To reiterate, if your code requires global transitivity, use general
		barriers throughout.


		========================
		@@ -1459,7 +1550,7 @@ of optimizations:
		the following:

		a = 0;
		/* Code that does not store to variable a. */
		... Code that does not store to variable a ...
		a = 0;

		The compiler sees that the value of variable 'a' is already zero, so
		@@ -1471,7 +1562,7 @@ of optimizations:
		wrong guess:

		WRITE_ONCE(a, 0);
		/* Code that does not store to variable a. */
		... Code that does not store to variable a ...
		WRITE_ONCE(a, 0);

		(*) The compiler is within its rights to reorder memory accesses unless