Skip to content
  1. Sep 18, 2021
  2. Sep 16, 2021
  3. Sep 15, 2021
  4. Sep 14, 2021
    • Andrii Nakryiko's avatar
      libbpf: Make libbpf_version.h non-auto-generated · 2f383041
      Andrii Nakryiko authored
      Turn previously auto-generated libbpf_version.h header into a normal
      header file. This prevents various tricky Makefile integration issues,
      simplifies the overall build process, but also allows to further extend
      it with some more versioning-related APIs in the future.
      
      To prevent accidental out-of-sync versions as defined by libbpf.map and
      libbpf_version.h, Makefile checks their consistency at build time.
      
      Simultaneously with this change bump libbpf.map to v0.6.
      
      Also undo adding libbpf's output directory into include path for
      kernel/bpf/preload, bpftool, and resolve_btfids, which is not necessary
      because libbpf_version.h is just a normal header like any other.
      
      Fixes: 0b46b755
      
       ("libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210913222309.3220849-1-andrii@kernel.org
      2f383041
    • Daniel Borkmann's avatar
      bpf, selftests: Replicate tailcall limit test for indirect call case · dbd7eb14
      Daniel Borkmann authored
      
      
      The tailcall_3 test program uses bpf_tail_call_static() where the JIT
      would patch a direct jump. Add a new tailcall_6 test program replicating
      exactly the same test just ensuring that bpf_tail_call() uses a map
      index where the verifier cannot make assumptions this time.
      
      In other words, this will now cover both on x86-64 JIT, meaning, JIT
      images with emit_bpf_tail_call_direct() emission as well as JIT images
      with emit_bpf_tail_call_indirect() emission.
      
        # echo 1 > /proc/sys/net/core/bpf_jit_enable
        # ./test_progs -t tailcalls
        #136/1 tailcalls/tailcall_1:OK
        #136/2 tailcalls/tailcall_2:OK
        #136/3 tailcalls/tailcall_3:OK
        #136/4 tailcalls/tailcall_4:OK
        #136/5 tailcalls/tailcall_5:OK
        #136/6 tailcalls/tailcall_6:OK
        #136/7 tailcalls/tailcall_bpf2bpf_1:OK
        #136/8 tailcalls/tailcall_bpf2bpf_2:OK
        #136/9 tailcalls/tailcall_bpf2bpf_3:OK
        #136/10 tailcalls/tailcall_bpf2bpf_4:OK
        #136/11 tailcalls/tailcall_bpf2bpf_5:OK
        #136 tailcalls:OK
        Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED
      
        # echo 0 > /proc/sys/net/core/bpf_jit_enable
        # ./test_progs -t tailcalls
        #136/1 tailcalls/tailcall_1:OK
        #136/2 tailcalls/tailcall_2:OK
        #136/3 tailcalls/tailcall_3:OK
        #136/4 tailcalls/tailcall_4:OK
        #136/5 tailcalls/tailcall_5:OK
        #136/6 tailcalls/tailcall_6:OK
        [...]
      
      For interpreter, the tailcall_1-6 tests are passing as well. The later
      tailcall_bpf2bpf_* are failing due lack of bpf2bpf + tailcall support
      in interpreter, so this is expected.
      
      Also, manual inspection shows that both loaded programs from tailcall_3
      and tailcall_6 test case emit the expected opcodes:
      
      * tailcall_3 disasm, emit_bpf_tail_call_direct():
      
        [...]
         b:   push   %rax
         c:   push   %rbx
         d:   push   %r13
         f:   mov    %rdi,%rbx
        12:   movabs $0xffff8d3f5afb0200,%r13
        1c:   mov    %rbx,%rdi
        1f:   mov    %r13,%rsi
        22:   xor    %edx,%edx                 _
        24:   mov    -0x4(%rbp),%eax          |  limit check
        2a:   cmp    $0x20,%eax               |
        2d:   ja     0x0000000000000046       |
        2f:   add    $0x1,%eax                |
        32:   mov    %eax,-0x4(%rbp)          |_
        38:   nopl   0x0(%rax,%rax,1)
        3d:   pop    %r13
        3f:   pop    %rbx
        40:   pop    %rax
        41:   jmpq   0xffffffffffffe377
        [...]
      
      * tailcall_6 disasm, emit_bpf_tail_call_indirect():
      
        [...]
        47:   movabs $0xffff8d3f59143a00,%rsi
        51:   mov    %edx,%edx
        53:   cmp    %edx,0x24(%rsi)
        56:   jbe    0x0000000000000093        _
        58:   mov    -0x4(%rbp),%eax          |  limit check
        5e:   cmp    $0x20,%eax               |
        61:   ja     0x0000000000000093       |
        63:   add    $0x1,%eax                |
        66:   mov    %eax,-0x4(%rbp)          |_
        6c:   mov    0x110(%rsi,%rdx,8),%rcx
        74:   test   %rcx,%rcx
        77:   je     0x0000000000000093
        79:   pop    %rax
        7a:   mov    0x30(%rcx),%rcx
        7e:   add    $0xb,%rcx
        82:   callq  0x000000000000008e
        87:   pause
        89:   lfence
        8c:   jmp    0x0000000000000087
        8e:   mov    %rcx,(%rsp)
        92:   retq
        [...]
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Acked-by: default avatarPaul Chaignon <paul@cilium.io>
      Link: https://lore.kernel.org/bpf/CAM1=_QRyRVCODcXo_Y6qOm1iT163HoiSj8U2pZ8Rj3hzMTT=HQ@mail.gmail.com
      Link: https://lore.kernel.org/bpf/20210910091900.16119-1-daniel@iogearbox.net
      dbd7eb14
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: introduce bpf_get_branch_snapshot' · 14bef1ab
      Alexei Starovoitov authored
      Song Liu says:
      
      ====================
      
      Changes v6 => v7:
      1. Improve/fix intel_pmu_snapshot_branch_stack() logic. (Peter).
      
      Changes v5 => v6:
      1. Add local_irq_save/restore to intel_pmu_snapshot_branch_stack. (Peter)
      2. Remove buf and size check in bpf_get_branch_snapshot, move flags check
         to later fo the function. (Peter, Andrii)
      3. Revise comments for bpf_get_branch_snapshot in bpf.h (Andrii)
      
      Changes v4 => v5:
      1. Modify perf_snapshot_branch_stack_t to save some memcpy. (Andrii)
      2. Minor fixes in selftests. (Andrii)
      
      Changes v3 => v4:
      1. Do not reshuffle intel_pmu_disable_all(). Use some inline to save LBR
         entries. (Peter)
      2. Move static_call(perf_snapshot_branch_stack) to the helper. (Alexei)
      3. Add argument flags to bpf_get_branch_snapshot. (Andrii)
      4. Make MAX_BRANCH_SNAPSHOT an enum (Andrii). And rename it as
         PERF_MAX_BRANCH_SNAPSHOT
      5. Make bpf_get_branch_snapshot similar to bpf_read_branch_records.
         (Andrii)
      6. Move the test target function to bpf_testmod. Updated kallsyms_find_next
         to work properly with modules. (Andrii)
      
      Changes v2 => v3:
      1. Fix the use of static_call. (Peter)
      2. Limit the use to perfmon version >= 2. (Peter)
      3. Modify intel_pmu_snapshot_branch_stack() to use intel_pmu_disable_all
         and intel_pmu_enable_all().
      
      Changes v1 => v2:
      1. Rename the helper as bpf_get_branch_snapshot;
      2. Fix/simplify the use of static_call;
      3. Instead of percpu variables, let intel_pmu_snapshot_branch_stack output
         branch records to an output argument of type perf_branch_snapshot.
      
      Branch stack can be very useful in understanding software events. For
      example, when a long function, e.g. sys_perf_event_open, returns an errno,
      it is not obvious why the function failed. Branch stack could provide very
      helpful information in this type of scenarios.
      
      This set adds support to read branch stack with a new BPF helper
      bpf_get_branch_trace(). Currently, this is only supported in Intel systems.
      It is also possible to support the same feaure for PowerPC.
      
      The hardware that records the branch stace is not stopped automatically on
      software events. Therefore, it is necessary to stop it in software soon.
      Otherwise, the hardware buffers/registers will be flushed. One of the key
      design consideration in this set is to minimize the number of branch record
      entries between the event triggers and the hardware recorder is stopped.
      Based on this goal, current design is different from the discussions in
      original RFC [1]:
       1) Static call is used when supported, to save function pointer
          dereference;
       2) intel_pmu_lbr_disable_all is used instead of perf_pmu_disable(),
          because the latter uses about 10 entries before stopping LBR.
      
      With current code, on Intel CPU, LBR is stopped after 7 branch entries
      after fexit triggers:
      
      ID: 0 from bpf_get_branch_snapshot+18 to intel_pmu_snapshot_branch_stack+0
      ID: 1 from __brk_limit+477143934 to bpf_get_branch_snapshot+0
      ID: 2 from __brk_limit+477192263 to __brk_limit+477143880  # trampoline
      ID: 3 from __bpf_prog_enter+34 to __brk_limit+477192251
      ID: 4 from migrate_disable+60 to __bpf_prog_enter+9
      ID: 5 from __bpf_prog_enter+4 to migrate_disable+0
      ID: 6 from bpf_testmod_loop_test+20 to __bpf_prog_enter+0
      ID: 7 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
      ID: 8 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
      ...
      
      [1] https://lore.kernel.org/bpf/20210818012937.2522409-1-songliubraving@fb.com/
      
      
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      14bef1ab
    • Song Liu's avatar
      selftests/bpf: Add test for bpf_get_branch_snapshot · 025bd7c7
      Song Liu authored
      
      
      This test uses bpf_get_branch_snapshot from a fexit program. The test uses
      a target function (bpf_testmod_loop_test) and compares the record against
      kallsyms. If there isn't enough record matching kallsyms, the test fails.
      
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-4-songliubraving@fb.com
      025bd7c7
    • Song Liu's avatar
      bpf: Introduce helper bpf_get_branch_snapshot · 856c02db
      Song Liu authored
      
      
      Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
      branch trace from hardware (e.g. Intel LBR). To use the feature, the
      user need to create perf_event with proper branch_record filtering
      on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
      On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
      
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-3-songliubraving@fb.com
      856c02db
    • Song Liu's avatar
      perf: Enable branch record for software events · c22ac2a3
      Song Liu authored
      
      
      The typical way to access branch record (e.g. Intel LBR) is via hardware
      perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture
      reliable LBR. On the other hand, LBR could also be useful in non-PMI
      scenario. For example, in kretprobe or bpf fexit program, LBR could
      provide a lot of information on what happened with the function. Add API
      to use branch record for software use.
      
      Note that, when the software event triggers, it is necessary to stop the
      branch record hardware asap. Therefore, static_call is used to remove some
      branch instructions in this process.
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210910183352.3151445-2-songliubraving@fb.com
      c22ac2a3
  5. Sep 11, 2021