Commit ec57e94a authored by Peter Xu's avatar Peter Xu Committed by Wupeng Ma
Browse files

mm/hugetlb: fix pgtable lock on pmd sharing

mainline inclusion
from mainline-v6.5-rc1
commit 349d1670
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IAQT9Q

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=349d1670008d3dab99a11b015bef51ad3f26fb4f

--------------------------------

Huge pmd sharing operates on PUD not PMD, huge_pte_lock() is not suitable
in this case because it should only work for last level pte changes, while
pmd sharing is always one level higher.

Meanwhile, here we're locking over the spte pgtable lock which is even not
a lock for current mm but someone else's.

It seems even racy on operating on the lock, as after put_page() of the
spte pgtable page logically the page can be released, so at least the
spin_unlock() needs to be done after the put_page().

No report I am aware, I'm not even sure whether it'll just work on taking
the spte pmd lock, because while we're holding i_mmap read lock it probably
means the vma interval tree is frozen, all pte allocators over this pud
entry could always find the specific svma and spte page, so maybe they'll
serialize on this spte page lock?  Even so, doesn't seem to be expected.
It just seems to be an accident of cb900f41.

Fix it with the proper pud lock (which is the mm's page_table_lock).

Link: https://lkml.kernel.org/r/20230612160420.809818-1-peterx@redhat.com


Fixes: cb900f41 ("mm, hugetlb: convert hugetlbfs to use split pmd lock")
Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarMa Wupeng <mawupeng1@huawei.com>
parent 5a1d9701
Loading
Loading
Loading
Loading
+2 −3
Original line number Diff line number Diff line
@@ -5907,7 +5907,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
	unsigned long saddr;
	pte_t *spte = NULL;
	pte_t *pte;
	spinlock_t *ptl;

	if (!vma_shareable(vma, addr))
		return (pte_t *)pmd_alloc(mm, pud, addr);
@@ -5931,7 +5930,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
	if (!spte)
		goto out;

	ptl = huge_pte_lock(hstate_vma(vma), mm, spte);
	spin_lock(&mm->page_table_lock);
	if (pud_none(*pud)) {
		pud_populate(mm, pud,
				(pmd_t *)((unsigned long)spte & PAGE_MASK));
@@ -5939,7 +5938,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
	} else {
		put_page(virt_to_page(spte));
	}
	spin_unlock(ptl);
	spin_unlock(&mm->page_table_lock);
out:
	pte = (pte_t *)pmd_alloc(mm, pud, addr);
	return pte;