Skip to content
  1. Dec 14, 2019
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-dispatcher' · 02620d9e
      Alexei Starovoitov authored
      Björn Töpel says:
      
      ====================
      Overview
      ========
      
      This is the 6th iteration of the series that introduces the BPF
      dispatcher, which is a mechanism to avoid indirect calls.
      
      The BPF dispatcher is a multi-way branch code generator, targeted for
      BPF programs. E.g. when an XDP program is executed via the
      bpf_prog_run_xdp(), it is invoked via an indirect call. With
      retpolines enabled, the indirect call has a substantial performance
      impact. The dispatcher is a mechanism that transform indirect calls to
      direct calls, and therefore avoids the retpoline. The dispatcher is
      generated using the BPF JIT, and relies on text poking provided by
      bpf_arch_text_poke().
      
      The dispatcher hijacks a trampoline function it via the __fentry__ nop
      of the trampoline. One dispatcher instance currently supports up to 48
      dispatch points. This can be extended in the future.
      
      In this series, only one dispatcher instance is supported, and the
      only user is XDP. The dispatcher is updated when an XDP program is
      attached/detached to/from a netdev. An alternative to this could have
      been to update the dispatcher at program load point, but as there are
      usually more XDP programs loaded than attached, so the latter was
      picked.
      
      The XDP dispatcher is always enabled, if available, because it helps
      even when retpolines are disabled. Please refer to the "Performance"
      section below.
      
      The first patch refactors the image allocation from the BPF trampoline
      code. Patch two introduces the dispatcher, and patch three adds a
      dispatcher for XDP, and wires up the XDP control-/ fast-path. Patch
      four adds the dispatcher to BPF_TEST_RUN. Patch five adds a simple
      selftest, and the last adds alignment to jump targets.
      
      I have rebased the series on commit 679152d3 ("libbpf: Fix printf
      compilation warnings on ppc64le arch").
      
      Generated code, x86-64
      ======================
      
      The dispatcher currently has a maximum of 48 entries, where one entry
      is a unique BPF program. Multiple users of a dispatcher instance using
      the same BPF program will share that entry.
      
      The program/slot lookup is performed by a binary search, O(log
      n). Let's have a look at the generated code.
      
      The trampoline function has the following signature:
      
        unsigned int tramp(const void *ctx,
                           const struct bpf_insn *insnsi,
                           unsigned int (*bpf_func)(const void *,
                                                    const struct bpf_insn *))
      
      On Intel x86-64 this means that rdx will contain the bpf_func. To,
      make it easier to read, I've let the BPF programs have the following
      range: 0xffffffffffffffff (-1) to 0xfffffffffffffff0
      (-16). 0xffffffff81c00f10 is the retpoline thunk, in this case
      __x86_indirect_thunk_rdx. If retpolines are disabled the thunk will be
      a regular indirect call.
      
      The minimal dispatcher will then look like this:
      
      ffffffffc0002000: cmp    rdx,0xffffffffffffffff
      ffffffffc0002007: je     0xffffffffffffffff ; -1
      ffffffffc000200d: jmp    0xffffffff81c00f10
      
      A 16 entry dispatcher looks like this:
      
      ffffffffc0020000: cmp    rdx,0xfffffffffffffff7 ; -9
      ffffffffc0020007: jg     0xffffffffc0020130
      ffffffffc002000d: cmp    rdx,0xfffffffffffffff3 ; -13
      ffffffffc0020014: jg     0xffffffffc00200a0
      ffffffffc002001a: cmp    rdx,0xfffffffffffffff1 ; -15
      ffffffffc0020021: jg     0xffffffffc0020060
      ffffffffc0020023: cmp    rdx,0xfffffffffffffff0 ; -16
      ffffffffc002002a: jg     0xffffffffc0020040
      ffffffffc002002c: cmp    rdx,0xfffffffffffffff0 ; -16
      ffffffffc0020033: je     0xfffffffffffffff0 ; -16
      ffffffffc0020039: jmp    0xffffffff81c00f10
      ffffffffc002003e: xchg   ax,ax
      ffffffffc0020040: cmp    rdx,0xfffffffffffffff1 ; -15
      ffffffffc0020047: je     0xfffffffffffffff1 ; -15
      ffffffffc002004d: jmp    0xffffffff81c00f10
      ffffffffc0020052: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002005a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc0020060: cmp    rdx,0xfffffffffffffff2 ; -14
      ffffffffc0020067: jg     0xffffffffc0020080
      ffffffffc0020069: cmp    rdx,0xfffffffffffffff2 ; -14
      ffffffffc0020070: je     0xfffffffffffffff2 ; -14
      ffffffffc0020076: jmp    0xffffffff81c00f10
      ffffffffc002007b: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc0020080: cmp    rdx,0xfffffffffffffff3 ; -13
      ffffffffc0020087: je     0xfffffffffffffff3 ; -13
      ffffffffc002008d: jmp    0xffffffff81c00f10
      ffffffffc0020092: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002009a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc00200a0: cmp    rdx,0xfffffffffffffff5 ; -11
      ffffffffc00200a7: jg     0xffffffffc00200f0
      ffffffffc00200a9: cmp    rdx,0xfffffffffffffff4 ; -12
      ffffffffc00200b0: jg     0xffffffffc00200d0
      ffffffffc00200b2: cmp    rdx,0xfffffffffffffff4 ; -12
      ffffffffc00200b9: je     0xfffffffffffffff4 ; -12
      ffffffffc00200bf: jmp    0xffffffff81c00f10
      ffffffffc00200c4: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00200cc: nop    DWORD PTR [rax+0x0]
      ffffffffc00200d0: cmp    rdx,0xfffffffffffffff5 ; -11
      ffffffffc00200d7: je     0xfffffffffffffff5 ; -11
      ffffffffc00200dd: jmp    0xffffffff81c00f10
      ffffffffc00200e2: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00200ea: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc00200f0: cmp    rdx,0xfffffffffffffff6 ; -10
      ffffffffc00200f7: jg     0xffffffffc0020110
      ffffffffc00200f9: cmp    rdx,0xfffffffffffffff6 ; -10
      ffffffffc0020100: je     0xfffffffffffffff6 ; -10
      ffffffffc0020106: jmp    0xffffffff81c00f10
      ffffffffc002010b: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc0020110: cmp    rdx,0xfffffffffffffff7 ; -9
      ffffffffc0020117: je     0xfffffffffffffff7 ; -9
      ffffffffc002011d: jmp    0xffffffff81c00f10
      ffffffffc0020122: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002012a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc0020130: cmp    rdx,0xfffffffffffffffb ; -5
      ffffffffc0020137: jg     0xffffffffc00201d0
      ffffffffc002013d: cmp    rdx,0xfffffffffffffff9 ; -7
      ffffffffc0020144: jg     0xffffffffc0020190
      ffffffffc0020146: cmp    rdx,0xfffffffffffffff8 ; -8
      ffffffffc002014d: jg     0xffffffffc0020170
      ffffffffc002014f: cmp    rdx,0xfffffffffffffff8 ; -8
      ffffffffc0020156: je     0xfffffffffffffff8 ; -8
      ffffffffc002015c: jmp    0xffffffff81c00f10
      ffffffffc0020161: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc0020169: nop    DWORD PTR [rax+0x0]
      ffffffffc0020170: cmp    rdx,0xfffffffffffffff9 ; -7
      ffffffffc0020177: je     0xfffffffffffffff9 ; -7
      ffffffffc002017d: jmp    0xffffffff81c00f10
      ffffffffc0020182: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002018a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc0020190: cmp    rdx,0xfffffffffffffffa ; -6
      ffffffffc0020197: jg     0xffffffffc00201b0
      ffffffffc0020199: cmp    rdx,0xfffffffffffffffa ; -6
      ffffffffc00201a0: je     0xfffffffffffffffa ; -6
      ffffffffc00201a6: jmp    0xffffffff81c00f10
      ffffffffc00201ab: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00201b0: cmp    rdx,0xfffffffffffffffb ; -5
      ffffffffc00201b7: je     0xfffffffffffffffb ; -5
      ffffffffc00201bd: jmp    0xffffffff81c00f10
      ffffffffc00201c2: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00201ca: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc00201d0: cmp    rdx,0xfffffffffffffffd ; -3
      ffffffffc00201d7: jg     0xffffffffc0020220
      ffffffffc00201d9: cmp    rdx,0xfffffffffffffffc ; -4
      ffffffffc00201e0: jg     0xffffffffc0020200
      ffffffffc00201e2: cmp    rdx,0xfffffffffffffffc ; -4
      ffffffffc00201e9: je     0xfffffffffffffffc ; -4
      ffffffffc00201ef: jmp    0xffffffff81c00f10
      ffffffffc00201f4: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00201fc: nop    DWORD PTR [rax+0x0]
      ffffffffc0020200: cmp    rdx,0xfffffffffffffffd ; -3
      ffffffffc0020207: je     0xfffffffffffffffd ; -3
      ffffffffc002020d: jmp    0xffffffff81c00f10
      ffffffffc0020212: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002021a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc0020220: cmp    rdx,0xfffffffffffffffe ; -2
      ffffffffc0020227: jg     0xffffffffc0020240
      ffffffffc0020229: cmp    rdx,0xfffffffffffffffe ; -2
      ffffffffc0020230: je     0xfffffffffffffffe ; -2
      ffffffffc0020236: jmp    0xffffffff81c00f10
      ffffffffc002023b: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc0020240: cmp    rdx,0xffffffffffffffff ; -1
      ffffffffc0020247: je     0xffffffffffffffff ; -1
      ffffffffc002024d: jmp    0xffffffff81c00f10
      
      The nops are there to align jump targets to 16 B.
      
      Performance
      ===========
      
      The tests were performed using the xdp_rxq_info sample program with
      the following command-line:
      
      1. XDP_DRV:
        # xdp_rxq_info --dev eth0 --action XDP_DROP
      2. XDP_SKB:
        # xdp_rxq_info --dev eth0 -S --action XDP_DROP
      3. xdp-perf, from selftests/bpf:
        # test_progs -v -t xdp_perf
      
      Run with mitigations=auto
      -------------------------
      
      Baseline:
      1. 21.7 Mpps (21736190)
      2. 3.8 Mpps   (3837582)
      3. 15 ns
      
      Dispatcher:
      1. 30.2 Mpps (30176320)
      2. 4.0 Mpps   (4015579)
      3. 5 ns
      
      Dispatcher (full; walk all entries, and fallback):
      1. 22.0 Mpps (21986704)
      2. 3.8 Mpps   (3831298)
      3. 17 ns
      
      Run with mitigations=off
      ------------------------
      
      Baseline:
      1. 29.9 Mpps (29875135)
      2. 4.1 Mpps   (4100179)
      3. 4 ns
      
      Dispatcher:
      1. 30.4 Mpps (30439241)
      2. 4.1 Mpps   (4109350)
      1. 4 ns
      
      Dispatcher (full; walk all entries, and fallback):
      1. 28.9 Mpps (28903269)
      2. 4.1 Mpps   (4080078)
      3. 5 ns
      
      xdp-perf runs, aliged vs non-aligned jump targets
      -------------------------------------------------
      
      In this test dispatchers of different sizes, with and without jump
      target alignment, were exercised. As outlined above the function
      lookup is performed via binary search. This means that depending on
      the pointer value of the function, it can reside in the upper or lower
      part of the search table. The performed tests were:
      
      1. aligned, mititations=auto, function entry < other entries
      2. aligned, mititations=auto, function entry > other entries
      3. non-aligned, mititations=auto, function entry < other entries
      4. non-aligned, mititations=auto, function entry > other entries
      5. aligned, mititations=off, function entry < other entries
      6. aligned, mititations=off, function entry > other entries
      7. non-aligned, mititations=off, function entry < other entries
      8. non-aligned, mititations=off, function entry > other entries
      
      The micro benchmarks showed that alignment of jump target has some
      positive impact.
      
      A reply to this cover letter will contain complete data for all runs.
      
      Multiple xdp-perf baseline with mitigations=auto
      ------------------------------------------------
      
       Performance counter stats for './test_progs -v -t xdp_perf' (1024 runs):
      
                   16.69 msec task-clock                #    0.984 CPUs utilized            ( +-  0.08% )
                       2      context-switches          #    0.123 K/sec                    ( +-  1.11% )
                       0      cpu-migrations            #    0.000 K/sec                    ( +- 70.68% )
                      97      page-faults               #    0.006 M/sec                    ( +-  0.05% )
              49,254,635      cycles                    #    2.951 GHz                      ( +-  0.09% )  (12.28%)
              42,138,558      instructions              #    0.86  insn per cycle           ( +-  0.02% )  (36.15%)
               7,315,291      branches                  #  438.300 M/sec                    ( +-  0.01% )  (59.43%)
               1,011,201      branch-misses             #   13.82% of all branches          ( +-  0.01% )  (83.31%)
              15,440,788      L1-dcache-loads           #  925.143 M/sec                    ( +-  0.00% )  (99.40%)
                  39,067      L1-dcache-load-misses     #    0.25% of all L1-dcache hits    ( +-  0.04% )
                   6,531      LLC-loads                 #    0.391 M/sec                    ( +-  0.05% )
                     442      LLC-load-misses           #    6.76% of all LL-cache hits     ( +-  0.77% )
         <not supported>      L1-icache-loads
                  57,964      L1-icache-load-misses                                         ( +-  0.06% )
              15,442,496      dTLB-loads                #  925.246 M/sec                    ( +-  0.00% )
                     514      dTLB-load-misses          #    0.00% of all dTLB cache hits   ( +-  0.73% )  (40.57%)
                     130      iTLB-loads                #    0.008 M/sec                    ( +-  2.75% )  (16.69%)
           <not counted>      iTLB-load-misses                                              ( +-  8.71% )  (0.60%)
         <not supported>      L1-dcache-prefetches
         <not supported>      L1-dcache-prefetch-misses
      
               0.0169558 +- 0.0000127 seconds time elapsed  ( +-  0.07% )
      
      Multiple xdp-perf dispatcher with mitigations=auto
      --------------------------------------------------
      
      Note that this includes generating the dispatcher.
      
       Performance counter stats for './test_progs -v -t xdp_perf' (1024 runs):
      
                    4.80 msec task-clock                #    0.953 CPUs utilized            ( +-  0.06% )
                       1      context-switches          #    0.258 K/sec                    ( +-  1.57% )
                       0      cpu-migrations            #    0.000 K/sec
                      97      page-faults               #    0.020 M/sec                    ( +-  0.05% )
              14,185,861      cycles                    #    2.955 GHz                      ( +-  0.17% )  (50.49%)
              45,691,935      instructions              #    3.22  insn per cycle           ( +-  0.01% )  (99.19%)
               8,346,008      branches                  # 1738.709 M/sec                    ( +-  0.00% )
                  13,046      branch-misses             #    0.16% of all branches          ( +-  0.10% )
              15,443,735      L1-dcache-loads           # 3217.365 M/sec                    ( +-  0.00% )
                  39,585      L1-dcache-load-misses     #    0.26% of all L1-dcache hits    ( +-  0.05% )
                   7,138      LLC-loads                 #    1.487 M/sec                    ( +-  0.06% )
                     671      LLC-load-misses           #    9.40% of all LL-cache hits     ( +-  0.73% )
         <not supported>      L1-icache-loads
                  56,213      L1-icache-load-misses                                         ( +-  0.08% )
              15,443,735      dTLB-loads                # 3217.365 M/sec                    ( +-  0.00% )
           <not counted>      dTLB-load-misses                                              (0.00%)
           <not counted>      iTLB-loads                                                    (0.00%)
           <not counted>      iTLB-load-misses                                              (0.00%)
         <not supported>      L1-dcache-prefetches
         <not supported>      L1-dcache-prefetch-misses
      
              0.00503705 +- 0.00000546 seconds time elapsed  ( +-  0.11% )
      
      Revisions
      =========
      
      v4->v5: [1]
        * Fixed s/xdp_ctx/ctx/ type-o (Toke)
        * Marked dispatcher trampoline with noinline attribute (Alexei)
      
      v3->v4: [2]
        * Moved away from doing dispatcher lookup based on the trampoline
          function, to a model where the dispatcher instance is explicitly
          passed to the bpf_dispatcher_change_prog() (Alexei)
      
      v2->v3: [3]
        * Removed xdp_call, and instead make the dispatcher available to all
          XDP users via bpf_prog_run_xdp() and dev_xdp_install(). (Toke)
        * Always enable the dispatcher, if available (Alexei)
        * Reuse BPF trampoline image allocator (Alexei)
        * Make sure the dispatcher is exercised in selftests (Alexei)
        * Only allow one dispatcher, and wire it to XDP
      
      v1->v2: [4]
        * Fixed i386 build warning (kbuild robot)
        * Made bpf_dispatcher_lookup() static (kbuild robot)
        * Make sure xdp_call.h is only enabled for builtins
        * Add xdp_call() to ixgbe, mlx4, and mlx5
      
      RFC->v1: [5]
        * Improved error handling (Edward and Andrii)
        * Explicit cleanup (Andrii)
        * Use 32B with sext cmp (Alexei)
        * Align jump targets to 16B (Alexei)
        * 4 to 16 entries (Toke)
        * Added stats to xdp_call_run()
      
      [1] https://lore.kernel.org/bpf/20191211123017.13212-1-bjorn.topel@gmail.com/
      [2] https://lore.kernel.org/bpf/20191209135522.16576-1-bjorn.topel@gmail.com/
      [3] https://lore.kernel.org/bpf/20191123071226.6501-1-bjorn.topel@gmail.com/
      [4] https://lore.kernel.org/bpf/20191119160757.27714-1-bjorn.topel@gmail.com/
      [5] https://lore.kernel.org/bpf/20191113204737.31623-1-bjorn.topel@gmail.com/
      
      
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      02620d9e
    • Björn Töpel's avatar
      bpf, x86: Align dispatcher branch targets to 16B · 116eb788
      Björn Töpel authored
      
      
      >From Intel 64 and IA-32 Architectures Optimization Reference Manual,
      3.4.1.4 Code Alignment, Assembly/Compiler Coding Rule 11: All branch
      targets should be 16-byte aligned.
      
      This commits aligns branch targets according to the Intel manual.
      
      The nops used to align branch targets make the dispatcher larger, and
      therefore the number of supported dispatch points/programs are
      descreased from 64 to 48.
      
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-7-bjorn.topel@gmail.com
      116eb788
    • Björn Töpel's avatar
      selftests: bpf: Add xdp_perf test · e754f5a6
      Björn Töpel authored
      
      
      The xdp_perf is a dummy XDP test, only used to measure the the cost of
      jumping into a naive XDP program one million times.
      
      To build and run the program:
        $ cd tools/testing/selftests/bpf
        $ make
        $ ./test_progs -v -t xdp_perf
      
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-6-bjorn.topel@gmail.com
      e754f5a6
    • Björn Töpel's avatar
      bpf: Start using the BPF dispatcher in BPF_TEST_RUN · f23c4b39
      Björn Töpel authored
      
      
      In order to properly exercise the BPF dispatcher, this commit adds BPF
      dispatcher usage to BPF_TEST_RUN when executing XDP programs.
      
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-5-bjorn.topel@gmail.com
      f23c4b39
    • Björn Töpel's avatar
      bpf, xdp: Start using the BPF dispatcher for XDP · 7e6897f9
      Björn Töpel authored
      
      
      This commit adds a BPF dispatcher for XDP. The dispatcher is updated
      from the XDP control-path, dev_xdp_install(), and used when an XDP
      program is run via bpf_prog_run_xdp().
      
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-4-bjorn.topel@gmail.com
      7e6897f9
    • Björn Töpel's avatar
      bpf: Introduce BPF dispatcher · 75ccbef6
      Björn Töpel authored
      
      
      The BPF dispatcher is a multi-way branch code generator, mainly
      targeted for XDP programs. When an XDP program is executed via the
      bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect
      call has a substantial performance impact, when retpolines are
      enabled. The dispatcher transform indirect calls to direct calls, and
      therefore avoids the retpoline. The dispatcher is generated using the
      BPF JIT, and relies on text poking provided by bpf_arch_text_poke().
      
      The dispatcher hijacks a trampoline function it via the __fentry__ nop
      of the trampoline. One dispatcher instance currently supports up to 64
      dispatch points. A user creates a dispatcher with its corresponding
      trampoline with the DEFINE_BPF_DISPATCHER macro.
      
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-3-bjorn.topel@gmail.com
      75ccbef6
    • Björn Töpel's avatar
      bpf: Move trampoline JIT image allocation to a function · 98e8627e
      Björn Töpel authored
      
      
      Refactor the image allocation in the BPF trampoline code into a
      separate function, so it can be shared with the BPF dispatcher in
      upcoming commits.
      
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-2-bjorn.topel@gmail.com
      98e8627e
    • Andrii Nakryiko's avatar
      selftests/bpf: Fix perf_buffer test on systems w/ offline CPUs · 91cbdf74
      Andrii Nakryiko authored
      Fix up perf_buffer.c selftest to take into account offline/missing CPUs.
      
      Fixes: ee5cf82c
      
       ("selftests/bpf: test perf buffer API")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212013621.1691858-1-andriin@fb.com
      91cbdf74
    • Andrii Nakryiko's avatar
      libbpf: Don't attach perf_buffer to offline/missing CPUs · 783b8f01
      Andrii Nakryiko authored
      It's quite common on some systems to have more CPUs enlisted as "possible",
      than there are (and could ever be) present/online CPUs. In such cases,
      perf_buffer creationg will fail due to inability to create perf event on
      missing CPU with error like this:
      
      libbpf: failed to open perf buffer event on cpu #16: No such device
      
      This patch fixes the logic of perf_buffer__new() to ignore CPUs that are
      missing or currently offline. In rare cases where user explicitly listed
      specific CPUs to connect to, behavior is unchanged: libbpf will try to open
      perf event buffer on specified CPU(s) anyways.
      
      Fixes: fb84b822
      
       ("libbpf: add perf buffer API")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212013609.1691168-1-andriin@fb.com
      783b8f01
    • Andrii Nakryiko's avatar
      selftests/bpf: Add CPU mask parsing tests · 65bc4c40
      Andrii Nakryiko authored
      
      
      Add a bunch of test validating CPU mask parsing logic and error handling.
      
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212013559.1690898-1-andriin@fb.com
      65bc4c40
    • Andrii Nakryiko's avatar
      libbpf: Extract and generalize CPU mask parsing logic · 6803ee25
      Andrii Nakryiko authored
      
      
      This logic is re-used for parsing a set of online CPUs. Having it as an
      isolated piece of code working with input string makes it conveninent to test
      this logic as well. While refactoring, also improve the robustness of original
      implementation.
      
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212013548.1690564-1-andriin@fb.com
      6803ee25
    • Alexei Starovoitov's avatar
      Merge branch 'reuseport_to_test_progs' · 7708bd43
      Alexei Starovoitov authored
      Jakub Sitnicki says:
      
      ====================
      This change has been suggested by Martin Lau [0] during a review of a
      related patch set that extends reuseport tests [1].
      
      Patches 1 & 2 address a warning due to unrecognized section name from
      libbpf when running reuseport tests. We don't want to carry this warning
      into test_progs.
      
      Patches 3-8 massage the reuseport tests to ease the switch to test_progs
      framework. The intention here is to show the work. Happy to squash these,
      if needed.
      
      Patches 9-10 do the actual move and conversion to test_progs.
      
      Output from a test_progs run after changes pasted below.
      
      Thanks,
      Jakub
      
      [0] https://lore.kernel.org/bpf/20191123110751.6729-1-jakub@cloudflare.com/T/#m607d822caeb1eb5db101172821a78cc3896ff1c3
      [1] https://lore.kernel.org/bpf/20191123110751.6729-1-jakub@cloudflare.com/T/#m55881bae9fb6e34837d07a0c0a7ffbc138f8d06f
      
      
      ====================
      
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7708bd43
    • Jakub Sitnicki's avatar
      selftests/bpf: Switch reuseport tests for test_progs framework · 7ee0d4e9
      Jakub Sitnicki authored
      
      
      The tests were originally written in abort-on-error style. With the switch
      to test_progs we can no longer do that. So at the risk of not cleaning up
      some resource on failure, we now return to the caller on error.
      
      That said, failure inside one test should not affect others because we run
      setup/cleanup before/after every test.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-11-jakub@cloudflare.com
      7ee0d4e9
    • Jakub Sitnicki's avatar
      selftests/bpf: Move reuseport tests under prog_tests/ · 415bb4e1
      Jakub Sitnicki authored
      
      
      Do a pure move the show the actual work needed to adapt the tests in
      subsequent patch at the cost of breaking test_progs build for the moment.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-10-jakub@cloudflare.com
      415bb4e1
    • Jakub Sitnicki's avatar
      selftests/bpf: Pull up printing the test name into test runner · 250a91d4
      Jakub Sitnicki authored
      
      
      Again, prepare for switching reuseport tests to test_progs framework.
      test_progs framework will print the subtest name for us if we set it.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-9-jakub@cloudflare.com
      250a91d4
    • Jakub Sitnicki's avatar
      selftests/bpf: Propagate errors during setup for reuseport tests · 9af6c844
      Jakub Sitnicki authored
      
      
      Prepare for switching reuseport tests to test_progs framework, where we
      don't have the luxury to terminate the process on failure.
      
      Modify setup helpers to signal failure via the return value with the help
      of a macro similar to the one currently in use by the tests.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-8-jakub@cloudflare.com
      9af6c844
    • Jakub Sitnicki's avatar
      selftests/bpf: Run reuseport tests in a loop · ce7cb5f3
      Jakub Sitnicki authored
      
      
      Prepare for switching reuseport tests to test_progs framework. Loop over
      the tests and perform setup/cleanup for each test separately, remembering
      that with test_progs we can select tests to run.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-7-jakub@cloudflare.com
      ce7cb5f3
    • Jakub Sitnicki's avatar
      selftests/bpf: Unroll the main loop in reuseport test · 99363382
      Jakub Sitnicki authored
      
      
      Prepare for iterating over individual tests without introducing another
      nested loop in the main test function.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-6-jakub@cloudflare.com
      99363382
    • Jakub Sitnicki's avatar
      selftests/bpf: Add helpers for getting socket family & type name · a9ce4cf4
      Jakub Sitnicki authored
      
      
      Having string arrays to map socket family & type to a name prevents us from
      unrolling the test runner loop in the subsequent patch. Introduce helpers
      that do the same thing.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-5-jakub@cloudflare.com
      a9ce4cf4
    • Jakub Sitnicki's avatar
      selftests/bpf: Use sa_family_t everywhere in reuseport tests · 11f80355
      Jakub Sitnicki authored
      
      
      Update the only function that is not using sa_family_t in this source file.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-4-jakub@cloudflare.com
      11f80355
    • Jakub Sitnicki's avatar
      selftests/bpf: Let libbpf determine program type from section name · 1fbcef92
      Jakub Sitnicki authored
      
      
      Now that libbpf can recognize SK_REUSEPORT programs, we no longer have to
      pass a prog_type hint before loading the object file.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-3-jakub@cloudflare.com
      1fbcef92
    • Jakub Sitnicki's avatar
      libbpf: Recognize SK_REUSEPORT programs from section name · 67d69ccd
      Jakub Sitnicki authored
      
      
      Allow loading BPF object files that contain SK_REUSEPORT programs without
      having to manually set the program type before load if the the section name
      is set to "sk_reuseport".
      
      Makes user-space code needed to load SK_REUSEPORT BPF program more concise.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191212102259.418536-2-jakub@cloudflare.com
      67d69ccd
  2. Dec 13, 2019
  3. Dec 12, 2019
    • Daniel Borkmann's avatar
      bpf, x86, arm64: Enable jit by default when not built as always-on · 81c22041
      Daniel Borkmann authored
      After Spectre 2 fix via 290af866 ("bpf: introduce BPF_JIT_ALWAYS_ON
      config") most major distros use BPF_JIT_ALWAYS_ON configuration these days
      which compiles out the BPF interpreter entirely and always enables the
      JIT. Also given recent fix in e1608f3f
      
       ("bpf: Avoid setting bpf insns
      pages read-only when prog is jited"), we additionally avoid fragmenting
      the direct map for the BPF insns pages sitting in the general data heap
      since they are not used during execution. Latter is only needed when run
      through the interpreter.
      
      Since both x86 and arm64 JITs have seen a lot of exposure over the years,
      are generally most up to date and maintained, there is more downside in
      !BPF_JIT_ALWAYS_ON configurations to have the interpreter enabled by default
      rather than the JIT. Add a ARCH_WANT_DEFAULT_BPF_JIT config which archs can
      use to set the bpf_jit_{enable,kallsyms} to 1. Back in the days the
      bpf_jit_kallsyms knob was set to 0 by default since major distros still
      had /proc/kallsyms addresses exposed to unprivileged user space which is
      not the case anymore. Hence both knobs are set via BPF_JIT_DEFAULT_ON which
      is set to 'y' in case of BPF_JIT_ALWAYS_ON or ARCH_WANT_DEFAULT_BPF_JIT.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/f78ad24795c2966efcc2ee19025fa3459f622185.1575903816.git.daniel@iogearbox.net
      81c22041
    • Daniel Borkmann's avatar
      bpf: Emit audit messages upon successful prog load and unload · bae141f5
      Daniel Borkmann authored
      
      
      Allow for audit messages to be emitted upon BPF program load and
      unload for having a timeline of events. The load itself is in
      syscall context, so additional info about the process initiating
      the BPF prog creation can be logged and later directly correlated
      to the unload event.
      
      The only info really needed from BPF side is the globally unique
      prog ID where then audit user space tooling can query / dump all
      info needed about the specific BPF program right upon load event
      and enrich the record, thus these changes needed here can be kept
      small and non-intrusive to the core.
      
      Raw example output:
      
        # auditctl -D
        # auditctl -a always,exit -F arch=x86_64 -S bpf
        # ausearch --start recent -m 1334
        ...
        ----
        time->Wed Nov 27 16:04:13 2019
        type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf"
        type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321   \
          success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477    \
          pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001    \
          egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf"                \
          exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf"                  \
          subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
        type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD
        ----
        time->Wed Nov 27 16:04:13 2019
        type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD
        ...
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Co-developed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Link: https://lore.kernel.org/bpf/20191206214934.11319-1-jolsa@kernel.org
      bae141f5
  4. Dec 11, 2019
  5. Dec 10, 2019