Unverified Commit b8042a6b authored by openeuler-ci-bot's avatar openeuler-ci-bot Committed by Gitee
Browse files

!344 mm: fix false-positive OVERCOMMIT_GUESS failures

Merge Pull Request from: @liu-qiang-chinatelecom 
 
    mm: fix false-positive OVERCOMMIT_GUESS failures

    With the default overcommit==guess we occasionally run into mmap
    rejections despite plenty of memory that would get dropped under
    pressure but just isn't accounted reclaimable. One example of this is
    dying cgroups pinned by some page cache. A previous case was auxiliary
    path name memory associated with dentries; we have since annotated
    those allocations to avoid overcommit failures (see d79f7aa4 ("mm:
    treat indirectly reclaimable memory as free in overcommit logic")).
    
    But trying to classify all allocated memory reliably as reclaimable
    and unreclaimable is a bit of a fool's errand. There could be a myriad
    of dependencies that constantly change with kernel versions.
    
    It becomes even more questionable of an effort when considering how
    this estimate of available memory is used: it's not compared to the
    system-wide allocated virtual memory in any way. It's not even
    compared to the allocating process's address space. It's compared to
    the single allocation request at hand!
    
    So we have an elaborate left-hand side of the equation that tries to
    assess the exact breathing room the system has available down to a
    page - and then compare it to an isolated allocation request with no
    additional context. We could fail an allocation of N bytes, but for
    two allocations of N/2 bytes we'd do this elaborate dance twice in a
    row and then still let N bytes of virtual memory through. This doesn't
    make a whole lot of sense.
    
    Let's take a step back and look at the actual goal of the
    heuristic. From the documentation:
    
       Heuristic overcommit handling. Obvious overcommits of address
       space are refused. Used for a typical system. It ensures a
       seriously wild allocation fails while allowing overcommit to
       reduce swap usage.  root is allowed to allocate slightly more
       memory in this mode. This is the default.
    
    If all we want to do is catch clearly bogus allocation requests
    irrespective of the general virtual memory situation, the physical
    memory counter-part doesn't need to be that complicated, either.
    
    When in GUESS mode, catch wild allocations by comparing their request
    size to total amount of ram and swap in the system. 
 
Link:https://gitee.com/openeuler/kernel/pulls/344

 

Reviewed-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
parents ecf818a5 5c3b7b13
Loading
Loading
Loading
Loading
+5 −47
Original line number Diff line number Diff line
@@ -666,7 +666,7 @@ EXPORT_SYMBOL_GPL(vm_memory_committed);
 */
int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
{
	long free, allowed, reserve;
	long allowed;

	VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
			-(s64)vm_committed_as_batch * num_online_cpus(),
@@ -681,52 +681,9 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
		return 0;

	if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
		free = global_zone_page_state(NR_FREE_PAGES);
		free += global_node_page_state(NR_FILE_PAGES);

		/*
		 * shmem pages shouldn't be counted as free in this
		 * case, they can't be purged, only swapped out, and
		 * that won't affect the overall amount of available
		 * memory in the system.
		 */
		free -= global_node_page_state(NR_SHMEM);

		free += get_nr_swap_pages();

		/*
		 * Any slabs which are created with the
		 * SLAB_RECLAIM_ACCOUNT flag claim to have contents
		 * which are reclaimable, under pressure.  The dentry
		 * cache and most inode caches should fall into this
		 */
		free += global_node_page_state(NR_SLAB_RECLAIMABLE);

		/*
		 * Part of the kernel memory, which can be released
		 * under memory pressure.
		 */
		free += global_node_page_state(
			NR_INDIRECTLY_RECLAIMABLE_BYTES) >> PAGE_SHIFT;

		/*
		 * Leave reserved pages. The pages are not for anonymous pages.
		 */
		if (free <= totalreserve_pages)
		if (pages > totalram_pages + total_swap_pages)
			goto error;
		else
			free -= totalreserve_pages;

		/*
		 * Reserve some for root
		 */
		if (!cap_sys_admin)
			free -= sysctl_admin_reserve_kbytes >> (PAGE_SHIFT - 10);

		if (free > pages)
		return 0;

		goto error;
	}

	allowed = vm_commit_limit();
@@ -740,7 +697,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
	 * Don't let a single process grow so big a user can't recover
	 */
	if (mm) {
		reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
		long reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);

		allowed -= min_t(long, mm->total_vm / 32, reserve);
	}