Loading docs/devel/atomics.txt +30 −27 Original line number Diff line number Diff line Loading @@ -122,20 +122,30 @@ In general, if the algorithm you are writing includes both writes and reads on the same side, it is generally simpler to use sequentially consistent primitives. When using this model, variables are accessed with atomic_read() and atomic_set(), and restrictions to the ordering of accesses is enforced When using this model, variables are accessed with: - atomic_read() and atomic_set(); these prevent the compiler from optimizing accesses out of existence and creating unsolicited accesses, but do not otherwise impose any ordering on loads and stores: both the compiler and the processor are free to reorder them. - atomic_load_acquire(), which guarantees the LOAD to appear to happen, with respect to the other components of the system, before all the LOAD or STORE operations specified afterwards. Operations coming before atomic_load_acquire() can still be reordered after it. - atomic_store_release(), which guarantees the STORE to appear to happen, with respect to the other components of the system, after all the LOAD or STORE operations specified afterwards. Operations coming after atomic_store_release() can still be reordered after it. Restrictions to the ordering of accesses can also be specified using the memory barrier macros: smp_rmb(), smp_wmb(), smp_mb(), smp_mb_acquire(), smp_mb_release(), smp_read_barrier_depends(). atomic_read() and atomic_set() prevents the compiler from using optimizations that might otherwise optimize accesses out of existence on the one hand, or that might create unsolicited accesses on the other. In general this should not have any effect, because the same compiler barriers are already implied by memory barriers. However, it is useful to do so, because it tells readers which variables are shared with other threads, and which are local to the current thread or protected by other, more mundane means. Memory barriers control the order of references to shared memory. They come in six kinds: Loading Loading @@ -232,7 +242,7 @@ make atomic_mb_set() the more expensive operation. There are two common cases in which atomic_mb_read and atomic_mb_set generate too many memory barriers, and thus it can be useful to manually place barriers instead: place barriers, or use atomic_load_acquire/atomic_store_release instead: - when a data structure has one thread that is always a writer and one thread that is always a reader, manual placement of Loading @@ -243,18 +253,15 @@ place barriers instead: thread 1 thread 1 ------------------------- ------------------------ (other writes) smp_mb_release() atomic_mb_set(&a, x) atomic_set(&a, x) smp_wmb() atomic_mb_set(&b, y) atomic_set(&b, y) atomic_mb_set(&a, x) atomic_store_release(&a, x) atomic_mb_set(&b, y) atomic_store_release(&b, y) => thread 2 thread 2 ------------------------- ------------------------ y = atomic_mb_read(&b) y = atomic_read(&b) smp_rmb() x = atomic_mb_read(&a) x = atomic_read(&a) smp_mb_acquire() y = atomic_mb_read(&b) y = atomic_load_acquire(&b) x = atomic_mb_read(&a) x = atomic_load_acquire(&a) (other reads) Note that the barrier between the stores in thread 1, and between the loads in thread 2, has been optimized here to a write or a Loading @@ -276,7 +283,6 @@ place barriers instead: smp_mb_acquire(); Similarly, atomic_mb_set() can be transformed as follows: smp_mb(): smp_mb_release(); for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++) Loading @@ -284,6 +290,8 @@ place barriers instead: smp_mb(); The other thread can still use atomic_mb_read()/atomic_mb_set(). The two tricks can be combined. In this case, splitting a loop in two lets you hoist the barriers out of the loops _and_ eliminate the expensive smp_mb(): Loading @@ -296,8 +304,6 @@ expensive smp_mb(): atomic_set(&a[i], false); smp_mb(); The other thread can still use atomic_mb_read()/atomic_mb_set() Memory barrier pairing ---------------------- Loading Loading @@ -386,10 +392,7 @@ and memory barriers, and the equivalents in QEMU: note that smp_store_mb() is a little weaker than atomic_mb_set(). atomic_mb_read() compiles to the same instructions as Linux's smp_load_acquire(), but this should be treated as an implementation detail. QEMU does have atomic_load_acquire() and atomic_store_release() macros, but for now they are only used within atomic.h. This may change in the future. detail. SOURCES ======= Loading Loading
docs/devel/atomics.txt +30 −27 Original line number Diff line number Diff line Loading @@ -122,20 +122,30 @@ In general, if the algorithm you are writing includes both writes and reads on the same side, it is generally simpler to use sequentially consistent primitives. When using this model, variables are accessed with atomic_read() and atomic_set(), and restrictions to the ordering of accesses is enforced When using this model, variables are accessed with: - atomic_read() and atomic_set(); these prevent the compiler from optimizing accesses out of existence and creating unsolicited accesses, but do not otherwise impose any ordering on loads and stores: both the compiler and the processor are free to reorder them. - atomic_load_acquire(), which guarantees the LOAD to appear to happen, with respect to the other components of the system, before all the LOAD or STORE operations specified afterwards. Operations coming before atomic_load_acquire() can still be reordered after it. - atomic_store_release(), which guarantees the STORE to appear to happen, with respect to the other components of the system, after all the LOAD or STORE operations specified afterwards. Operations coming after atomic_store_release() can still be reordered after it. Restrictions to the ordering of accesses can also be specified using the memory barrier macros: smp_rmb(), smp_wmb(), smp_mb(), smp_mb_acquire(), smp_mb_release(), smp_read_barrier_depends(). atomic_read() and atomic_set() prevents the compiler from using optimizations that might otherwise optimize accesses out of existence on the one hand, or that might create unsolicited accesses on the other. In general this should not have any effect, because the same compiler barriers are already implied by memory barriers. However, it is useful to do so, because it tells readers which variables are shared with other threads, and which are local to the current thread or protected by other, more mundane means. Memory barriers control the order of references to shared memory. They come in six kinds: Loading Loading @@ -232,7 +242,7 @@ make atomic_mb_set() the more expensive operation. There are two common cases in which atomic_mb_read and atomic_mb_set generate too many memory barriers, and thus it can be useful to manually place barriers instead: place barriers, or use atomic_load_acquire/atomic_store_release instead: - when a data structure has one thread that is always a writer and one thread that is always a reader, manual placement of Loading @@ -243,18 +253,15 @@ place barriers instead: thread 1 thread 1 ------------------------- ------------------------ (other writes) smp_mb_release() atomic_mb_set(&a, x) atomic_set(&a, x) smp_wmb() atomic_mb_set(&b, y) atomic_set(&b, y) atomic_mb_set(&a, x) atomic_store_release(&a, x) atomic_mb_set(&b, y) atomic_store_release(&b, y) => thread 2 thread 2 ------------------------- ------------------------ y = atomic_mb_read(&b) y = atomic_read(&b) smp_rmb() x = atomic_mb_read(&a) x = atomic_read(&a) smp_mb_acquire() y = atomic_mb_read(&b) y = atomic_load_acquire(&b) x = atomic_mb_read(&a) x = atomic_load_acquire(&a) (other reads) Note that the barrier between the stores in thread 1, and between the loads in thread 2, has been optimized here to a write or a Loading @@ -276,7 +283,6 @@ place barriers instead: smp_mb_acquire(); Similarly, atomic_mb_set() can be transformed as follows: smp_mb(): smp_mb_release(); for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++) Loading @@ -284,6 +290,8 @@ place barriers instead: smp_mb(); The other thread can still use atomic_mb_read()/atomic_mb_set(). The two tricks can be combined. In this case, splitting a loop in two lets you hoist the barriers out of the loops _and_ eliminate the expensive smp_mb(): Loading @@ -296,8 +304,6 @@ expensive smp_mb(): atomic_set(&a[i], false); smp_mb(); The other thread can still use atomic_mb_read()/atomic_mb_set() Memory barrier pairing ---------------------- Loading Loading @@ -386,10 +392,7 @@ and memory barriers, and the equivalents in QEMU: note that smp_store_mb() is a little weaker than atomic_mb_set(). atomic_mb_read() compiles to the same instructions as Linux's smp_load_acquire(), but this should be treated as an implementation detail. QEMU does have atomic_load_acquire() and atomic_store_release() macros, but for now they are only used within atomic.h. This may change in the future. detail. SOURCES ======= Loading