Commit 44e09568 authored by Ingo Molnar's avatar Ingo Molnar
Browse files

x86/mm: Clean up the pmd_read_atomic() comments



Fix spelling, consistent parenthesis and grammar - and also clarify
the language where needed.

Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent a2f7a0bf
Loading
Loading
Loading
Loading
+23 −21
Original line number Diff line number Diff line
@@ -36,39 +36,41 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)

#define pmd_read_atomic pmd_read_atomic
/*
 * pte_offset_map_lock on 32bit PAE kernels was reading the pmd_t with
 * a "*pmdp" dereference done by gcc. Problem is, in certain places
 * where pte_offset_map_lock is called, concurrent page faults are
 * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with
 * a "*pmdp" dereference done by GCC. Problem is, in certain places
 * where pte_offset_map_lock() is called, concurrent page faults are
 * allowed, if the mmap_sem is hold for reading. An example is mincore
 * vs page faults vs MADV_DONTNEED. On the page fault side
 * pmd_populate rightfully does a set_64bit, but if we're reading the
 * pmd_populate() rightfully does a set_64bit(), but if we're reading the
 * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
 * because gcc will not read the 64bit of the pmd atomically. To fix
 * this all places running pte_offset_map_lock() while holding the
 * because GCC will not read the 64-bit value of the pmd atomically.
 *
 * To fix this all places running pte_offset_map_lock() while holding the
 * mmap_sem in read mode, shall read the pmdp pointer using this
 * function to know if the pmd is null nor not, and in turn to know if
 * function to know if the pmd is null or not, and in turn to know if
 * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
 * operations.
 *
 * Without THP if the mmap_sem is hold for reading, the pmd can only
 * transition from null to not null while pmd_read_atomic runs. So
 * Without THP if the mmap_sem is held for reading, the pmd can only
 * transition from null to not null while pmd_read_atomic() runs. So
 * we can always return atomic pmd values with this function.
 *
 * With THP if the mmap_sem is hold for reading, the pmd can become
 * With THP if the mmap_sem is held for reading, the pmd can become
 * trans_huge or none or point to a pte (and in turn become "stable")
 * at any time under pmd_read_atomic. We could read it really
 * atomically here with a atomic64_read for the THP enabled case (and
 * at any time under pmd_read_atomic(). We could read it truly
 * atomically here with an atomic64_read() for the THP enabled case (and
 * it would be a whole lot simpler), but to avoid using cmpxchg8b we
 * only return an atomic pmdval if the low part of the pmdval is later
 * found stable (i.e. pointing to a pte). And we're returning a none
 * pmdval if the low part of the pmd is none. In some cases the high
 * and low part of the pmdval returned may not be consistent if THP is
 * enabled (the low part may point to previously mapped hugepage,
 * while the high part may point to a more recently mapped hugepage),
 * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part
 * of the pmd to be read atomically to decide if the pmd is unstable
 * or not, with the only exception of when the low part of the pmd is
 * zero in which case we return a none pmd.
 * found to be stable (i.e. pointing to a pte). We are also returning a
 * 'none' (zero) pmdval if the low part of the pmd is zero.
 *
 * In some cases the high and low part of the pmdval returned may not be
 * consistent if THP is enabled (the low part may point to previously
 * mapped hugepage, while the high part may point to a more recently
 * mapped hugepage), but pmd_none_or_trans_huge_or_clear_bad() only
 * needs the low part of the pmd to be read atomically to decide if the
 * pmd is unstable or not, with the only exception when the low part
 * of the pmd is zero, in which case we return a 'none' pmd.
 */
static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
{