Skip to content
Commit 944d9fec authored by Luiz Capitulino's avatar Luiz Capitulino Committed by Linus Torvalds
Browse files

hugetlb: add support for gigantic page allocation at runtime



HugeTLB is limited to allocating hugepages whose size are less than
MAX_ORDER order.  This is so because HugeTLB allocates hugepages via the
buddy allocator.  Gigantic pages (that is, pages whose size is greater
than MAX_ORDER order) have to be allocated at boottime.

However, boottime allocation has at least two serious problems.  First,
it doesn't support NUMA and second, gigantic pages allocated at boottime
can't be freed.

This commit solves both issues by adding support for allocating gigantic
pages during runtime.  It works just like regular sized hugepages,
meaning that the interface in sysfs is the same, it supports NUMA, and
gigantic pages can be freed.

For example, on x86_64 gigantic pages are 1GB big. To allocate two 1G
gigantic pages on node 1, one can do:

 # echo 2 > \
   /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

And to free them all:

 # echo 0 > \
   /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

The one problem with gigantic page allocation at runtime is that it
can't be serviced by the buddy allocator.  To overcome that problem,
this commit scans all zones from a node looking for a large enough
contiguous region.  When one is found, it's allocated by using CMA, that
is, we call alloc_contig_range() to do the actual allocation.  For
example, on x86_64 we scan all zones looking for a 1GB contiguous
region.  When one is found, it's allocated by alloc_contig_range().

One expected issue with that approach is that such gigantic contiguous
regions tend to vanish as runtime goes by.  The best way to avoid this
for now is to make gigantic page allocations very early during system
boot, say from a init script.  Other possible optimization include using
compaction, which is supported by CMA but is not explicitly used by this
commit.

It's also important to note the following:

 1. Gigantic pages allocated at boottime by the hugepages= command-line
    option can be freed at runtime just fine

 2. This commit adds support for gigantic pages only to x86_64. The
    reason is that I don't have access to nor experience with other archs.
    The code is arch indepedent though, so it should be simple to add
    support to different archs

 3. I didn't add support for hugepage overcommit, that is allocating
    a gigantic page on demand when
   /proc/sys/vm/nr_overcommit_hugepages > 0. The reason is that I don't
   think it's reasonable to do the hard and long work required for
   allocating a gigantic page at fault time. But it should be simple
   to add this if wanted

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 1cac6f2c
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment