Commit 0ce20dd8 authored by Alexander Potapenko's avatar Alexander Potapenko Committed by Linus Torvalds
Browse files

mm: add Kernel Electric-Fence infrastructure

Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.  This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.

The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).

We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.

KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].

For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:

	https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence

This patch (of 9):

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:

  ---+-----------+-----------+-----------+-----------+-----------+---
     | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
     | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
     | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
     | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
     | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
     | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
  ---+-----------+-----------+-----------+-----------+-----------+---

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.

For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).

[elver@google.com: fix parameter description for kfence_object_start()]
  Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com
[elver@google.com: avoid stalling work queue task without allocations]
  Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
  Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
[elver@google.com: fix potential deadlock due to wake_up()]
  Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
  Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
[elver@google.com: add option to use KFENCE without static keys]
  Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
[elver@google.com: add missing copyright and description headers]
  Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com

Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com


Signed-off-by: default avatarMarco Elver <elver@google.com>
Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
Reviewed-by: default avatarSeongJae Park <sjpark@amazon.de>
Co-developed-by: default avatarMarco Elver <elver@google.com>
Reviewed-by: default avatarJann Horn <jannh@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 87005394
Loading
Loading
Loading
Loading

include/linux/kfence.h

0 → 100644
+216 −0
Original line number Diff line number Diff line
/* SPDX-License-Identifier: GPL-2.0 */
/*
 * Kernel Electric-Fence (KFENCE). Public interface for allocator and fault
 * handler integration. For more info see Documentation/dev-tools/kfence.rst.
 *
 * Copyright (C) 2020, Google LLC.
 */

#ifndef _LINUX_KFENCE_H
#define _LINUX_KFENCE_H

#include <linux/mm.h>
#include <linux/types.h>

#ifdef CONFIG_KFENCE

/*
 * We allocate an even number of pages, as it simplifies calculations to map
 * address to metadata indices; effectively, the very first page serves as an
 * extended guard page, but otherwise has no special purpose.
 */
#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)
extern char *__kfence_pool;

#ifdef CONFIG_KFENCE_STATIC_KEYS
#include <linux/static_key.h>
DECLARE_STATIC_KEY_FALSE(kfence_allocation_key);
#else
#include <linux/atomic.h>
extern atomic_t kfence_allocation_gate;
#endif

/**
 * is_kfence_address() - check if an address belongs to KFENCE pool
 * @addr: address to check
 *
 * Return: true or false depending on whether the address is within the KFENCE
 * object range.
 *
 * KFENCE objects live in a separate page range and are not to be intermixed
 * with regular heap objects (e.g. KFENCE objects must never be added to the
 * allocator freelists). Failing to do so may and will result in heap
 * corruptions, therefore is_kfence_address() must be used to check whether
 * an object requires specific handling.
 *
 * Note: This function may be used in fast-paths, and is performance critical.
 * Future changes should take this into account; for instance, we want to avoid
 * introducing another load and therefore need to keep KFENCE_POOL_SIZE a
 * constant (until immediate patching support is added to the kernel).
 */
static __always_inline bool is_kfence_address(const void *addr)
{
	/*
	 * The non-NULL check is required in case the __kfence_pool pointer was
	 * never initialized; keep it in the slow-path after the range-check.
	 */
	return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && addr);
}

/**
 * kfence_alloc_pool() - allocate the KFENCE pool via memblock
 */
void __init kfence_alloc_pool(void);

/**
 * kfence_init() - perform KFENCE initialization at boot time
 *
 * Requires that kfence_alloc_pool() was called before. This sets up the
 * allocation gate timer, and requires that workqueues are available.
 */
void __init kfence_init(void);

/**
 * kfence_shutdown_cache() - handle shutdown_cache() for KFENCE objects
 * @s: cache being shut down
 *
 * Before shutting down a cache, one must ensure there are no remaining objects
 * allocated from it. Because KFENCE objects are not referenced from the cache
 * directly, we need to check them here.
 *
 * Note that shutdown_cache() is internal to SL*B, and kmem_cache_destroy() does
 * not return if allocated objects still exist: it prints an error message and
 * simply aborts destruction of a cache, leaking memory.
 *
 * If the only such objects are KFENCE objects, we will not leak the entire
 * cache, but instead try to provide more useful debug info by making allocated
 * objects "zombie allocations". Objects may then still be used or freed (which
 * is handled gracefully), but usage will result in showing KFENCE error reports
 * which include stack traces to the user of the object, the original allocation
 * site, and caller to shutdown_cache().
 */
void kfence_shutdown_cache(struct kmem_cache *s);

/*
 * Allocate a KFENCE object. Allocators must not call this function directly,
 * use kfence_alloc() instead.
 */
void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags);

/**
 * kfence_alloc() - allocate a KFENCE object with a low probability
 * @s:     struct kmem_cache with object requirements
 * @size:  exact size of the object to allocate (can be less than @s->size
 *         e.g. for kmalloc caches)
 * @flags: GFP flags
 *
 * Return:
 * * NULL     - must proceed with allocating as usual,
 * * non-NULL - pointer to a KFENCE object.
 *
 * kfence_alloc() should be inserted into the heap allocation fast path,
 * allowing it to transparently return KFENCE-allocated objects with a low
 * probability using a static branch (the probability is controlled by the
 * kfence.sample_interval boot parameter).
 */
static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
{
#ifdef CONFIG_KFENCE_STATIC_KEYS
	if (static_branch_unlikely(&kfence_allocation_key))
#else
	if (unlikely(!atomic_read(&kfence_allocation_gate)))
#endif
		return __kfence_alloc(s, size, flags);
	return NULL;
}

/**
 * kfence_ksize() - get actual amount of memory allocated for a KFENCE object
 * @addr: pointer to a heap object
 *
 * Return:
 * * 0     - not a KFENCE object, must call __ksize() instead,
 * * non-0 - this many bytes can be accessed without causing a memory error.
 *
 * kfence_ksize() returns the number of bytes requested for a KFENCE object at
 * allocation time. This number may be less than the object size of the
 * corresponding struct kmem_cache.
 */
size_t kfence_ksize(const void *addr);

/**
 * kfence_object_start() - find the beginning of a KFENCE object
 * @addr: address within a KFENCE-allocated object
 *
 * Return: address of the beginning of the object.
 *
 * SL[AU]B-allocated objects are laid out within a page one by one, so it is
 * easy to calculate the beginning of an object given a pointer inside it and
 * the object size. The same is not true for KFENCE, which places a single
 * object at either end of the page. This helper function is used to find the
 * beginning of a KFENCE-allocated object.
 */
void *kfence_object_start(const void *addr);

/**
 * __kfence_free() - release a KFENCE heap object to KFENCE pool
 * @addr: object to be freed
 *
 * Requires: is_kfence_address(addr)
 *
 * Release a KFENCE object and mark it as freed.
 */
void __kfence_free(void *addr);

/**
 * kfence_free() - try to release an arbitrary heap object to KFENCE pool
 * @addr: object to be freed
 *
 * Return:
 * * false - object doesn't belong to KFENCE pool and was ignored,
 * * true  - object was released to KFENCE pool.
 *
 * Release a KFENCE object and mark it as freed. May be called on any object,
 * even non-KFENCE objects, to simplify integration of the hooks into the
 * allocator's free codepath. The allocator must check the return value to
 * determine if it was a KFENCE object or not.
 */
static __always_inline __must_check bool kfence_free(void *addr)
{
	if (!is_kfence_address(addr))
		return false;
	__kfence_free(addr);
	return true;
}

/**
 * kfence_handle_page_fault() - perform page fault handling for KFENCE pages
 * @addr: faulting address
 *
 * Return:
 * * false - address outside KFENCE pool,
 * * true  - page fault handled by KFENCE, no additional handling required.
 *
 * A page fault inside KFENCE pool indicates a memory error, such as an
 * out-of-bounds access, a use-after-free or an invalid memory access. In these
 * cases KFENCE prints an error message and marks the offending page as
 * present, so that the kernel can proceed.
 */
bool __must_check kfence_handle_page_fault(unsigned long addr);

#else /* CONFIG_KFENCE */

static inline bool is_kfence_address(const void *addr) { return false; }
static inline void kfence_alloc_pool(void) { }
static inline void kfence_init(void) { }
static inline void kfence_shutdown_cache(struct kmem_cache *s) { }
static inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { return NULL; }
static inline size_t kfence_ksize(const void *addr) { return 0; }
static inline void *kfence_object_start(const void *addr) { return NULL; }
static inline void __kfence_free(void *addr) { }
static inline bool __must_check kfence_free(void *addr) { return false; }
static inline bool __must_check kfence_handle_page_fault(unsigned long addr) { return false; }

#endif

#endif /* _LINUX_KFENCE_H */
+3 −0
Original line number Diff line number Diff line
@@ -40,6 +40,7 @@
#include <linux/security.h>
#include <linux/smp.h>
#include <linux/profile.h>
#include <linux/kfence.h>
#include <linux/rcupdate.h>
#include <linux/moduleparam.h>
#include <linux/kallsyms.h>
@@ -824,6 +825,7 @@ static void __init mm_init(void)
	 */
	page_ext_init_flatmem();
	init_mem_debugging_and_hardening();
	kfence_alloc_pool();
	report_meminit();
	mem_init();
	/* page_owner must be initialized after buddy is ready */
@@ -955,6 +957,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
	hrtimers_init();
	softirq_init();
	timekeeping_init();
	kfence_init();

	/*
	 * For best initial stack canary entropy, prepare it after:
+1 −0
Original line number Diff line number Diff line
@@ -938,6 +938,7 @@ config DEBUG_STACKOVERFLOW
	  If in doubt, say "N".

source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"

endmenu # "Memory Debugging"

lib/Kconfig.kfence

0 → 100644
+67 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: GPL-2.0-only

config HAVE_ARCH_KFENCE
	bool

menuconfig KFENCE
	bool "KFENCE: low-overhead sampling-based memory safety error detector"
	depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB)
	select STACKTRACE
	help
	  KFENCE is a low-overhead sampling-based detector of heap out-of-bounds
	  access, use-after-free, and invalid-free errors. KFENCE is designed
	  to have negligible cost to permit enabling it in production
	  environments.

	  Note that, KFENCE is not a substitute for explicit testing with tools
	  such as KASAN. KFENCE can detect a subset of bugs that KASAN can
	  detect, albeit at very different performance profiles. If you can
	  afford to use KASAN, continue using KASAN, for example in test
	  environments. If your kernel targets production use, and cannot
	  enable KASAN due to its cost, consider using KFENCE.

if KFENCE

config KFENCE_STATIC_KEYS
	bool "Use static keys to set up allocations"
	default y
	depends on JUMP_LABEL # To ensure performance, require jump labels
	help
	  Use static keys (static branches) to set up KFENCE allocations. Using
	  static keys is normally recommended, because it avoids a dynamic
	  branch in the allocator's fast path. However, with very low sample
	  intervals, or on systems that do not support jump labels, a dynamic
	  branch may still be an acceptable performance trade-off.

config KFENCE_SAMPLE_INTERVAL
	int "Default sample interval in milliseconds"
	default 100
	help
	  The KFENCE sample interval determines the frequency with which heap
	  allocations will be guarded by KFENCE. May be overridden via boot
	  parameter "kfence.sample_interval".

	  Set this to 0 to disable KFENCE by default, in which case only
	  setting "kfence.sample_interval" to a non-zero value enables KFENCE.

config KFENCE_NUM_OBJECTS
	int "Number of guarded objects available"
	range 1 65535
	default 255
	help
	  The number of guarded objects available. For each KFENCE object, 2
	  pages are required; with one containing the object and two adjacent
	  ones used as guard pages.

config KFENCE_STRESS_TEST_FAULTS
	int "Stress testing of fault handling and error reporting" if EXPERT
	default 0
	help
	  The inverse probability with which to randomly protect KFENCE object
	  pages, resulting in spurious use-after-frees. The main purpose of
	  this option is to stress test KFENCE with concurrent error reports
	  and allocations/frees. A value of 0 disables stress testing logic.

	  Only for KFENCE testing; set to 0 if you are not a KFENCE developer.

endif # KFENCE
+1 −0
Original line number Diff line number Diff line
@@ -81,6 +81,7 @@ obj-$(CONFIG_PAGE_POISONING) += page_poison.o
obj-$(CONFIG_SLAB) += slab.o
obj-$(CONFIG_SLUB) += slub.o
obj-$(CONFIG_KASAN)	+= kasan/
obj-$(CONFIG_KFENCE) += kfence/
obj-$(CONFIG_FAILSLAB) += failslab.o
obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
obj-$(CONFIG_MEMTEST)		+= memtest.o
Loading