Commit 2c1da66a authored by Miaohe Lin's avatar Miaohe Lin Committed by Liu Shixin
Browse files

fork: defer linking file vma until vma is fully initialized

mainline inclusion
from mainline-v6.9-rc5
commit 35e351780fa9d8240dd6f7e4f245f9ea37e96c19
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9NYY7
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=35e351780fa9d8240dd6f7e4f245f9ea37e96c19

--------------------------------

Thorvald reported a WARNING [1]. And the root cause is below race:

 CPU 1					CPU 2
 fork					hugetlbfs_fallocate
  dup_mmap				 hugetlbfs_punch_hole
   i_mmap_lock_write(mapping);
   vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
   i_mmap_unlock_write(mapping);
   hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
					 i_mmap_lock_write(mapping);
   					 hugetlb_vmdelete_list
					  vma_interval_tree_foreach
					   hugetlb_vma_trylock_write -- Vma_lock is cleared.
   tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
					   hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
					 i_mmap_unlock_write(mapping);

hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time.  Fix this
by deferring linking file vma until vma is fully initialized.  Those vmas
should be initialized first before they can be used.

Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com


Fixes: 8d9bfb26 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
Reported-by: default avatarThorvald Natvig <thorvald@google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/

 [1]
Reviewed-by: default avatarJane Chu <jane.chu@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Tycho Andersen <tandersen@netflix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
[ The stable commit cec11fa2eb51 is conflict due to commit d37e5614,
  backport the mainline version instead. ]
Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
parent a0f7add6
Loading
Loading
Loading
Loading
+17 −16
Original line number Diff line number Diff line
@@ -745,6 +745,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
		} else if (anon_vma_fork(tmp, mpnt))
			goto fail_nomem_anon_vma_fork;
		vm_flags_clear(tmp, VM_LOCKED_MASK);
		/*
		 * Copy/update hugetlb private vma information.
		 */
		if (is_vm_hugetlb_page(tmp))
			hugetlb_dup_vma_private(tmp);

		/*
		 * Link the vma into the MT. After using __mt_dup(), memory
		 * allocation is not necessary here, so it cannot fail.
		 */
		vma_iter_bulk_store(&vmi, tmp);

		mm->map_count++;

		if (tmp->vm_ops && tmp->vm_ops->open)
			tmp->vm_ops->open(tmp);

		file = tmp->vm_file;
		if (file) {
			struct address_space *mapping = file->f_mapping;
@@ -761,25 +778,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
			i_mmap_unlock_write(mapping);
		}

		/*
		 * Copy/update hugetlb private vma information.
		 */
		if (is_vm_hugetlb_page(tmp))
			hugetlb_dup_vma_private(tmp);

		/*
		 * Link the vma into the MT. After using __mt_dup(), memory
		 * allocation is not necessary here, so it cannot fail.
		 */
		vma_iter_bulk_store(&vmi, tmp);

		mm->map_count++;
		if (!(tmp->vm_flags & VM_WIPEONFORK))
			retval = copy_page_range(tmp, mpnt);

		if (tmp->vm_ops && tmp->vm_ops->open)
			tmp->vm_ops->open(tmp);

		if (retval) {
			mpnt = vma_next(&vmi);
			goto loop_out;