Skip to content
  1. Mar 12, 2024
    • Alexei Starovoitov's avatar
      bpf: Recognize addr_space_cast instruction in the verifier. · 6082b6c3
      Alexei Starovoitov authored
      
      
      rY = addr_space_cast(rX, 0, 1) tells the verifier that rY->type = PTR_TO_ARENA.
      Any further operations on PTR_TO_ARENA register have to be in 32-bit domain.
      
      The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32.
      JIT will generate them as kern_vm_start + 32bit_addr memory accesses.
      
      rY = addr_space_cast(rX, 1, 0) tells the verifier that rY->type = unknown scalar.
      If arena->map_flags has BPF_F_NO_USER_CONV set then convert cast_user to mov32 as well.
      Otherwise JIT will convert it to:
        rY = (u32)rX;
        if (rY)
           rY |= arena->user_vm_start & ~(u64)~0U;
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240308010812.89848-6-alexei.starovoitov@gmail.com
      6082b6c3
    • Alexei Starovoitov's avatar
      bpf: Add x86-64 JIT support for bpf_addr_space_cast instruction. · 142fd4d2
      Alexei Starovoitov authored
      
      
      LLVM generates bpf_addr_space_cast instruction while translating
      pointers between native (zero) address space and
      __attribute__((address_space(N))).
      The addr_space=1 is reserved as bpf_arena address space.
      
      rY = addr_space_cast(rX, 0, 1) is processed by the verifier and
      converted to normal 32-bit move: wX = wY
      
      rY = addr_space_cast(rX, 1, 0) has to be converted by JIT:
      
      aux_reg = upper_32_bits of arena->user_vm_start
      aux_reg <<= 32
      wX = wY // clear upper 32 bits of dst register
      if (wX) // if not zero add upper bits of user_vm_start
        wX |= aux_reg
      
      JIT can do it more efficiently:
      
      mov dst_reg32, src_reg32  // 32-bit move
      shl dst_reg, 32
      or dst_reg, user_vm_start
      rol dst_reg, 32
      xor r11, r11
      test dst_reg32, dst_reg32 // check if lower 32-bit are zero
      cmove r11, dst_reg	  // if so, set dst_reg to zero
      			  // Intel swapped src/dst register encoding in CMOVcc
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240308010812.89848-5-alexei.starovoitov@gmail.com
      142fd4d2
    • Alexei Starovoitov's avatar
      bpf: Add x86-64 JIT support for PROBE_MEM32 pseudo instructions. · 2fe99eb0
      Alexei Starovoitov authored
      
      
      Add support for [LDX | STX | ST], PROBE_MEM32, [B | H | W | DW] instructions.
      They are similar to PROBE_MEM instructions with the following differences:
      - PROBE_MEM has to check that the address is in the kernel range with
        src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE check
      - PROBE_MEM doesn't support store
      - PROBE_MEM32 relies on the verifier to clear upper 32-bit in the register
      - PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in %r12 in the prologue)
        Due to bpf_arena constructions such %r12 + %reg + off16 access is guaranteed
        to be within arena virtual range, so no address check at run-time.
      - PROBE_MEM32 allows STX and ST. If they fault the store is a nop.
        When LDX faults the destination register is zeroed.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/bpf/20240308010812.89848-4-alexei.starovoitov@gmail.com
      2fe99eb0
    • Alexei Starovoitov's avatar
      bpf: Disasm support for addr_space_cast instruction. · 667a86ad
      Alexei Starovoitov authored
      
      
      LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space)
      instruction when pointers in non-zero address space are used by the bpf
      program. Recognize this insn in uapi and in bpf disassembler.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/bpf/20240308010812.89848-3-alexei.starovoitov@gmail.com
      667a86ad
    • Alexei Starovoitov's avatar
      bpf: Introduce bpf_arena. · 31746031
      Alexei Starovoitov authored
      Introduce bpf_arena, which is a sparse shared memory region between the bpf
      program and user space.
      
      Use cases:
      1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
         anonymous region, like memcached or any key/value storage. The bpf
         program implements an in-kernel accelerator. XDP prog can search for
         a key in bpf_arena and return a value without going to user space.
      2. The bpf program builds arbitrary data structures in bpf_arena (hash
         tables, rb-trees, sparse arrays), while user space consumes it.
      3. bpf_arena is a "heap" of memory from the bpf program's point of view.
         The user space may mmap it, but bpf program will not convert pointers
         to user base at run-time to improve bpf program speed.
      
      Initially, the kernel vm_area and user vma are not populated. User space
      can fault in pages within the range. While servicing a page fault,
      bpf_arena logic will insert a new page into the kernel and user vmas. The
      bpf program can allocate pages from that region via
      bpf_arena_alloc_pages(). This kernel function will insert pages into the
      kernel vm_area. The subsequent fault-in from user space will populate that
      page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
      can be used to prevent fault-in from user space. In such a case, if a page
      is not allocated by the bpf program and not present in the kernel vm_area,
      the user process will segfault. This is useful for use cases 2 and 3 above.
      
      bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
      either at a specific address within the arena or allocates a range with the
      maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
      pages and removes the range from the kernel vm_area and from user process
      vmas.
      
      bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
      bpf program is more important than ease of sharing with user space. This is
      use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
      It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
      instruction as a 32-bit move wX = wY, which will improve bpf prog
      performance. Otherwise, bpf_arena_cast_user is translated by JIT to
      conditionally add the upper 32 bits of user vm_start (if the pointer is not
      NULL) to arena pointers before they are stored into memory. This way, user
      space sees them as valid 64-bit pointers.
      
      Diff https://github.com/llvm/llvm-project/pull/84410
      
       enables LLVM BPF
      backend generate the bpf_addr_space_cast() instruction to cast pointers
      between address_space(1) which is reserved for bpf_arena pointers and
      default address space zero. All arena pointers in a bpf program written in
      C language are tagged as __attribute__((address_space(1))). Hence, clang
      provides helpful diagnostics when pointers cross address space. Libbpf and
      the kernel support only address_space == 1. All other address space
      identifiers are reserved.
      
      rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
      verifier that rX->type = PTR_TO_ARENA. Any further operations on
      PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
      mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
      them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
      to copy_from_kernel_nofault() except that no address checks are necessary.
      The address is guaranteed to be in the 4GB range. If the page is not
      present, the destination register is zeroed on read, and the operation is
      ignored on write.
      
      rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
      unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
      verifier converts such cast instructions to mov32. Otherwise, JIT will emit
      native code equivalent to:
      rX = (u32)rY;
      if (rY)
        rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */
      
      After such conversion, the pointer becomes a valid user pointer within
      bpf_arena range. The user process can access data structures created in
      bpf_arena without any additional computations. For example, a linked list
      built by a bpf program can be walked natively by user space.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarBarret Rhoden <brho@google.com>
      Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com
      31746031
    • Andrii Nakryiko's avatar
      selftests/bpf: Add fexit and kretprobe triggering benchmarks · 365c2b32
      Andrii Nakryiko authored
      
      
      We already have kprobe and fentry benchmarks. Let's add kretprobe and
      fexit ones for completeness.
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/20240309005124.3004446-1-andrii@kernel.org
      365c2b32
  2. Mar 11, 2024
  3. Mar 10, 2024
  4. Mar 08, 2024
  5. Mar 07, 2024