Commit 49ed1f1e authored by Nicholas Piggin's avatar Nicholas Piggin Committed by Jialin Zhang
Browse files

mm/vmalloc: huge vmalloc backing pages should be split rather than compound

mainline inclusion
from mainline-v5.18-rc4
commit 3b8000ae
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6LD0S
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b8000ae185cb068adbda5f966a3835053c85fd4

--------------------------------

Huge vmalloc higher-order backing pages were allocated with __GFP_COMP
in order to allow the sub-pages to be refcounted by callers such as
"remap_vmalloc_page [sic]" (remap_vmalloc_range).

However a similar problem exists for other struct page fields callers
use, for example fb_deferred_io_fault() takes a vmalloc'ed page and
not only refcounts it but uses ->lru, ->mapping, ->index.

This is not compatible with compound sub-pages, and can cause bad page
state issues like

  BUG: Bad page state in process swapper/0  pfn:00743
  page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743
  flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff)
  raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000
  raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: corrupted mapping in tail page
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810
  Call Trace:
    dump_stack_lvl+0x74/0xa8 (unreliable)
    bad_page+0x12c/0x170
    free_tail_pages_check+0xe8/0x190
    free_pcp_prepare+0x31c/0x4e0
    free_unref_page+0x40/0x1b0
    __vunmap+0x1d8/0x420
    ...

The correct approach is to use split high-order pages for the huge
vmalloc backing. These allow callers to treat them in exactly the same
way as individually-allocated order-0 pages.

Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/


Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Song Liu <songliubraving@fb.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>

conflicts:
	mm/vmalloc.c

Signed-off-by: default avatarZhangPeng <zhangpeng362@huawei.com>
Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: default avatarJialin Zhang <zhangjialin11@huawei.com>
parent e0198072
Loading
Loading
Loading
Loading
+17 −5
Original line number Diff line number Diff line
@@ -2641,14 +2641,17 @@ static void __vunmap(const void *addr, int deallocate_pages)
	vm_remove_mappings(area, deallocate_pages);

	if (deallocate_pages) {
		unsigned int page_order = vm_area_page_order(area);
		int i;

		for (i = 0; i < area->nr_pages; i += 1U << page_order) {
		for (i = 0; i < area->nr_pages; i++) {
			struct page *page = area->pages[i];

			BUG_ON(!page);
			__free_pages(page, page_order);
			/*
			 * High-order allocs for huge vmallocs are split, so
			 * can be freed as an array of order-0 allocations
			 */
			__free_pages(page, 0);
		}
		atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);

@@ -2930,8 +2933,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
		struct page *page;
		int p;

		/* Compound pages required for remap_vmalloc_page */
		page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order);
		page = alloc_pages_node(node, gfp_mask, page_order);
		if (unlikely(!page)) {
			/* Successfully allocated i pages, free them in __vfree() */
			area->nr_pages = i;
@@ -2943,6 +2945,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
			goto fail;
		}

		/*
		 * Higher order allocations must be able to be treated as
		 * indepdenent small pages by callers (as they can with
		 * small-page vmallocs). Some drivers do their own refcounting
		 * on vmalloc_to_page() pages, some use page->mapping,
		 * page->lru, etc.
		 */
		if (page_order)
			split_page(page, page_order);

		for (p = 0; p < (1U << page_order); p++)
			area->pages[i + p] = page + p;