Skip to content
  1. Sep 01, 2020
  2. Aug 29, 2020
    • Daniel Borkmann's avatar
      Merge branch 'bpf-sleepable' · 10496f26
      Daniel Borkmann authored
      
      
      Alexei Starovoitov says:
      
      ====================
      v2->v3:
      - switched to minimal allowlist approach. Essentially that means that syscall
        entry, few btrfs allow_error_inject functions, should_fail_bio(), and two LSM
        hooks: file_mprotect and bprm_committed_creds are the only hooks that allow
        attaching of sleepable BPF programs. When comprehensive analysis of LSM hooks
        will be done this allowlist will be extended.
      - added patch 1 that fixes prototypes of two mm functions to reliably work with
        error injection. It's also necessary for resolve_btfids tool to recognize
        these two funcs, but that's secondary.
      
      v1->v2:
      - split fmod_ret fix into separate patch
      - added denylist
      
      v1:
      This patch set introduces the minimal viable support for sleepable bpf programs.
      In this patch only fentry/fexit/fmod_ret and lsm progs can be sleepable.
      Only array and pre-allocated hash and lru maps allowed.
      
      Here is 'perf report' difference of sleepable vs non-sleepable:
         3.86%  bench     [k] __srcu_read_unlock
         3.22%  bench     [k] __srcu_read_lock
         0.92%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
         0.50%  bench     [k] bpf_trampoline_10297
         0.26%  bench     [k] __bpf_prog_exit_sleepable
         0.21%  bench     [k] __bpf_prog_enter_sleepable
      vs
         0.88%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry
         0.84%  bench     [k] bpf_trampoline_10297
         0.13%  bench     [k] __bpf_prog_enter
         0.12%  bench     [k] __bpf_prog_exit
      vs
         0.79%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
         0.72%  bench     [k] bpf_trampoline_10381
         0.31%  bench     [k] __bpf_prog_exit_sleepable
         0.29%  bench     [k] __bpf_prog_enter_sleepable
      
      Sleepable vs non-sleepable program invocation overhead is only marginally higher
      due to rcu_trace. srcu approach is much slower.
      ====================
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      10496f26
    • Alexei Starovoitov's avatar
      selftests/bpf: Add sleepable tests · e68a1445
      Alexei Starovoitov authored
      
      
      Modify few tests to sanity test sleepable bpf functionality.
      
      Running 'bench trig-fentry-sleep' vs 'bench trig-fentry' and 'perf report':
      sleepable with SRCU:
         3.86%  bench     [k] __srcu_read_unlock
         3.22%  bench     [k] __srcu_read_lock
         0.92%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
         0.50%  bench     [k] bpf_trampoline_10297
         0.26%  bench     [k] __bpf_prog_exit_sleepable
         0.21%  bench     [k] __bpf_prog_enter_sleepable
      
      sleepable with RCU_TRACE:
         0.79%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
         0.72%  bench     [k] bpf_trampoline_10381
         0.31%  bench     [k] __bpf_prog_exit_sleepable
         0.29%  bench     [k] __bpf_prog_enter_sleepable
      
      non-sleepable with RCU:
         0.88%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry
         0.84%  bench     [k] bpf_trampoline_10297
         0.13%  bench     [k] __bpf_prog_enter
         0.12%  bench     [k] __bpf_prog_exit
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-6-alexei.starovoitov@gmail.com
      e68a1445
    • Alexei Starovoitov's avatar
      libbpf: Support sleepable progs · 2b288740
      Alexei Starovoitov authored
      
      
      Pass request to load program as sleepable via ".s" suffix in the section name.
      If it happens in the future that all map types and helpers are allowed with
      BPF_F_SLEEPABLE flag "fmod_ret/" and "lsm/" can be aliased to "fmod_ret.s/" and
      "lsm.s/" to make all lsm and fmod_ret programs sleepable by default. The fentry
      and fexit programs would always need to have sleepable vs non-sleepable
      distinction, since not all fentry/fexit progs will be attached to sleepable
      kernel functions.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarKP Singh <kpsingh@google.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-5-alexei.starovoitov@gmail.com
      2b288740
    • Alexei Starovoitov's avatar
      bpf: Add bpf_copy_from_user() helper. · 07be4c4a
      Alexei Starovoitov authored
      
      
      Sleepable BPF programs can now use copy_from_user() to access user memory.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com
      07be4c4a
    • Alexei Starovoitov's avatar
      bpf: Introduce sleepable BPF programs · 1e6c62a8
      Alexei Starovoitov authored
      
      
      Introduce sleepable BPF programs that can request such property for themselves
      via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
      to use helpers like bpf_copy_from_user() that might sleep. At present only
      fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
      when they are attached to kernel functions that are known to allow sleeping.
      
      The non-sleepable programs are relying on implicit rcu_read_lock() and
      migrate_disable() to protect life time of programs, maps that they use and
      per-cpu kernel structures used to pass info between bpf programs and the
      kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
      migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
      should not be enclosed in migrate_disable() as well. Therefore
      rcu_read_lock_trace is used to protect the life time of sleepable progs.
      
      There are many networking and tracing program types. In many cases the
      'struct bpf_prog *' pointer itself is rcu protected within some other kernel
      data structure and the kernel code is using rcu_dereference() to load that
      program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
      Instead sleepable bpf programs are allowed with bpf trampoline only. The
      program pointers are hard-coded into generated assembly of bpf trampoline and
      synchronize_rcu_tasks_trace() is used to protect the life time of the program.
      The same trampoline can hold both sleepable and non-sleepable progs.
      
      When rcu_read_lock_trace is held it means that some sleepable bpf program is
      running from bpf trampoline. Those programs can use bpf arrays and preallocated
      hash/lru maps. These map types are waiting on programs to complete via
      synchronize_rcu_tasks_trace();
      
      Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
      synchronize_rcu_tasks() to wait for sleepable progs to finish and for
      trampoline assembly to finish.
      
      This is the first step of introducing sleepable progs. Eventually dynamically
      allocated hash maps can be allowed and networking program types can become
      sleepable too.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
      1e6c62a8
    • Alexei Starovoitov's avatar
      mm/error_inject: Fix allow_error_inject function signatures. · 76cd6173
      Alexei Starovoitov authored
      'static' and 'static noinline' function attributes make no guarantees that
      gcc/clang won't optimize them. The compiler may decide to inline 'static'
      function and in such case ALLOW_ERROR_INJECT becomes meaningless. The compiler
      could have inlined __add_to_page_cache_locked() in one callsite and didn't
      inline in another. In such case injecting errors into it would cause
      unpredictable behavior. It's worse with 'static noinline' which won't be
      inlined, but it still can be optimized. Like the compiler may decide to remove
      one argument or constant propagate the value depending on the callsite.
      
      To avoid such issues make sure that these functions are global noinline.
      
      Fixes: af3b8544 ("mm/page_alloc.c: allow error injection")
      Fixes: cfcbfb13
      
       ("mm/filemap.c: enable error injection at add_to_page_cache()")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-2-alexei.starovoitov@gmail.com
      76cd6173
  3. Aug 28, 2020
  4. Aug 27, 2020
  5. Aug 26, 2020