Skip to content
  1. Sep 28, 2021
    • Magnus Karlsson's avatar
      xsk: Batched buffer allocation for the pool · 47e4075d
      Magnus Karlsson authored
      
      
      Add a new driver interface xsk_buff_alloc_batch() offering batched
      buffer allocations to improve performance. The new interface takes
      three arguments: the buffer pool to allocated from, a pointer to an
      array of struct xdp_buff pointers which will contain pointers to the
      allocated xdp_buffs, and an unsigned integer specifying the max number
      of buffers to allocate. The return value is the actual number of
      buffers that the allocator managed to allocate and it will be in the
      range 0 <= N <= max, where max is the third parameter to the function.
      
      u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
                               u32 max);
      
      A second driver interface is also introduced that need to be used in
      conjunction with xsk_buff_alloc_batch(). It is a helper that sets the
      size of struct xdp_buff and is used by the NIC Rx irq routine when
      receiving a packet. This helper sets the three struct members data,
      data_meta, and data_end. The two first ones is in the xsk_buff_alloc()
      case set in the allocation routine and data_end is set when a packet
      is received in the receive irq function. This unfortunately leads to
      worse performance since the xdp_buff is touched twice with a long time
      period in between leading to an extra cache miss. Instead, we fill out
      the xdp_buff with all 3 fields at one single point in time in the
      driver, when the size of the packet is known. Hence this helper. Note
      that the driver has to use this helper (or set all three fields
      itself) when using xsk_buff_alloc_batch(). xsk_buff_alloc() works as
      before and does not require this.
      
      void xsk_buff_set_size(struct xdp_buff *xdp, u32 size);
      
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210922075613.12186-3-magnus.karlsson@gmail.com
      47e4075d
    • Magnus Karlsson's avatar
      xsk: Get rid of unused entry in struct xdp_buff_xsk · 10a5e009
      Magnus Karlsson authored
      
      
      Get rid of the unused entry "unaligned" in struct xdp_buff_xsk.
      
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210922075613.12186-2-magnus.karlsson@gmail.com
      10a5e009
  2. Sep 27, 2021
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Support <8-byte scalar spill and refill' · e7d5184b
      Alexei Starovoitov authored
      Martin KaFai says:
      
      ====================
      
      The verifier currently does not save the reg state when
      spilling <8byte bounded scalar to the stack.  The bpf program
      will be incorrectly rejected when this scalar is refilled to
      the reg and then used to offset into a packet header.
      The later patch has a simplified bpf prog from a real use case
      to demonstrate this case.  The current work around is
      to reparse the packet again such that this offset scalar
      is close to where the packet data will be accessed to
      avoid the spill.  Thus, the header is parsed twice.
      
      The llvm patch [1] will align the <8bytes spill to
      the 8-byte stack address.  This set is to make the necessary
      changes in verifier to support <8byte scalar spill and refill.
      
      [1] https://reviews.llvm.org/D109073
      
      
      
      v2:
      - Changed the xdpwall selftest in patch 3 to trigger a u32
        spill at a non 8-byte aligned stack address.  The v1 has
        simplified the real example too much such that it only
        triggers a u32 spill but does not spill at a non
        8-byte aligned stack address.
      - Changed README.rst in patch 3 to explain the llvm dependency
        for the xdpwall test.
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e7d5184b
    • Martin KaFai Lau's avatar
      bpf: selftest: Add verifier tests for <8-byte scalar spill and refill · ef979017
      Martin KaFai Lau authored
      
      
      This patch adds a few verifier tests for <8-byte spill and refill.
      
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210922004953.627183-1-kafai@fb.com
      ef979017
    • Martin KaFai Lau's avatar
      bpf: selftest: A bpf prog that has a 32bit scalar spill · 54ea6079
      Martin KaFai Lau authored
      It is a simplified example that can trigger a 32bit scalar spill.
      The const scalar is refilled and added to a skb->data later.
      Since the reg state of the 32bit scalar spill is not saved now,
      adding the refilled reg to skb->data and then comparing it with
      skb->data_end cannot verify the skb->data access.
      
      With the earlier verifier patch and the llvm patch [1].  The verifier
      can correctly verify the bpf prog.
      
      Here is the snippet of the verifier log that leads to verifier conclusion
      that the packet data is unsafe to read.  The log is from the kerne
      without the previous verifier patch to save the <8-byte scalar spill.
      67: R0=inv1 R1=inv17 R2=invP2 R3=inv1 R4=pkt(id=0,off=68,r=102,imm=0) R5=inv102 R6=pkt(id=0,off=62,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0
      67: (63) *(u32 *)(r10 -12) = r5
      68: R0=inv1 R1=inv17 R2=invP2 R3=inv1 R4=pkt(id=0,off=68,r=102,imm=0) R5=inv102 R6=pkt(id=0,off=62,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmm????
      ...
      101: R0_w=map_value_or_null(id=2,off=0,ks=16,vs=1,imm=0) R6_w=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmmmmmm
      101: (61) r1 = *(u32 *)(r10 -12)
      102: R0_w=map_value_or_null(id=2,off=0,ks=16,vs=1,imm=0) R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6_w=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmmmmmm
      102: (bc) w1 = w1
      103: R0_w=map_value_or_null(id=2,off=0,ks=16,vs=1,imm=0) R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6_w=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmmmmmm
      103: (0f) r7 += r1
      last_idx 103 first_idx 67
      regs=2 stack=0 before 102: (bc) w1 = w1
      regs=2 stack=0 before 101: (61) r1 = *(u32 *)(r10 -12)
      104: R0_w=map_value_or_null(id=2,off=0,ks=16,vs=1,imm=0) R1_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6_w=pkt(id=0,off=70,r=102,imm=0) R7_w=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmmmmmm
      ...
      127: R0_w=inv1 R1=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9_w=invP17 R10=fp0 fp-16=mmmmmmmm
      127: (bf) r1 = r7
      128: R0_w=inv1 R1_w=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9_w=invP17 R10=fp0 fp-16=mmmmmmmm
      128: (07) r1 += 8
      129: R0_w=inv1 R1_w=pkt(id=3,off=8,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9_w=invP17 R10=fp0 fp-16=mmmmmmmm
      129: (b4) w0 = 1
      130: R0=inv1 R1=pkt(id=3,off=8,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9=invP17 R10=fp0 fp-16=mmmmmmmm
      130: (2d) if r1 > r8 goto pc-66
       R0=inv1 R1=pkt(id=3,off=8,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9=invP17 R10=fp0 fp-16=mmmmmmmm
      131: R0=inv1 R1=pkt(id=3,off=8,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9=invP17 R10=fp0 fp-16=mmmmmmmm
      131: (69) r6 = *(u16 *)(r7 +0)
      invalid access to packet, off=0 size=2, R7(id=3,off=0,r=0)
      R7 offset is outside of the packet
      
      [1]: https://reviews.llvm.org/D109073
      
      
      
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210922004947.626286-1-kafai@fb.com
      54ea6079
    • Martin KaFai Lau's avatar
      bpf: Support <8-byte scalar spill and refill · 354e8f19
      Martin KaFai Lau authored
      The verifier currently does not save the reg state when
      spilling <8byte bounded scalar to the stack.  The bpf program
      will be incorrectly rejected when this scalar is refilled to
      the reg and then used to offset into a packet header.
      The later patch has a simplified bpf prog from a real use case
      to demonstrate this case.  The current work around is
      to reparse the packet again such that this offset scalar
      is close to where the packet data will be accessed to
      avoid the spill.  Thus, the header is parsed twice.
      
      The llvm patch [1] will align the <8bytes spill to
      the 8-byte stack address.  This can simplify the verifier
      support by avoiding to store multiple reg states for
      each 8 byte stack slot.
      
      This patch changes the verifier to save the reg state when
      spilling <8bytes scalar to the stack.  This reg state saving
      is limited to spill aligned to the 8-byte stack address.
      The current refill logic has already called coerce_reg_to_size(),
      so coerce_reg_to_size() is not called on state->stack[spi].spilled_ptr
      during spill.
      
      When refilling in check_stack_read_fixed_off(),  it checks
      the refill size is the same as the number of bytes marked with
      STACK_SPILL before restoring the reg state.  When restoring
      the reg state to state->regs[dst_regno], it needs
      to avoid the state->regs[dst_regno].subreg_def being
      over written because it has been marked by the check_reg_arg()
      earlier [check_mem_access() is called after check_reg_arg() in
      do_check()].  Reordering check_mem_access() and check_reg_arg()
      will need a lot of changes in test_verifier's tests because
      of the difference in verifier's error message.  Thus, the
      patch here is to save the state->regs[dst_regno].subreg_def
      first in check_stack_read_fixed_off().
      
      There are cases that the verifier needs to scrub the spilled slot
      from STACK_SPILL to STACK_MISC.  After this patch the spill is not always
      in 8 bytes now, so it can no longer assume the other 7 bytes are always
      marked as STACK_SPILL.  In particular, the scrub needs to avoid marking
      an uninitialized byte from STACK_INVALID to STACK_MISC.  Otherwise, the
      verifier will incorrectly accept bpf program reading uninitialized bytes
      from the stack.  A new helper scrub_spilled_slot() is created for this
      purpose.
      
      [1]: https://reviews.llvm.org/D109073
      
      
      
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210922004941.625398-1-kafai@fb.com
      354e8f19
    • Martin KaFai Lau's avatar
      bpf: Check the other end of slot_type for STACK_SPILL · 27113c59
      Martin KaFai Lau authored
      Every 8 bytes of the stack is tracked by a bpf_stack_state.
      Within each bpf_stack_state, there is a 'u8 slot_type[8]' to track
      the type of each byte.  Verifier tests slot_type[0] == STACK_SPILL
      to decide if the spilled reg state is saved.  Verifier currently only
      saves the reg state if the whole 8 bytes are spilled to the stack,
      so checking the slot_type[7] is the same as checking slot_type[0].
      
      The later patch will allow verifier to save the bounded scalar
      reg also for <8 bytes spill.  There is a llvm patch [1] to ensure
      the <8 bytes spill will be 8-byte aligned,  so checking
      slot_type[7] instead of slot_type[0] is required.
      
      While at it, this patch refactors the slot_type[0] == STACK_SPILL
      test into a new function is_spilled_reg() and change the
      slot_type[0] check to slot_type[7] check in there also.
      
      [1] https://reviews.llvm.org/D109073
      
      
      
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210922004934.624194-1-kafai@fb.com
      27113c59
  3. Sep 25, 2021
    • Yonghong Song's avatar
      selftests/bpf: Fix btf_dump __int128 test failure with clang build kernel · 091037fb
      Yonghong Song authored
      
      
      With clang build kernel (adding LLVM=1 to kernel and selftests/bpf build
      command line), I hit the following test failure:
      
        $ ./test_progs -t btf_dump
        ...
        btf_dump_data:PASS:ensure expected/actual match 0 nsec
        btf_dump_data:FAIL:find type id unexpected find type id: actual -2 < expected 0
        btf_dump_data:FAIL:find type id unexpected find type id: actual -2 < expected 0
        test_btf_dump_int_data:FAIL:dump __int128 unexpected error: -2 (errno 2)
        #15/9 btf_dump/btf_dump: int_data:FAIL
      
      Further analysis showed gcc build kernel has type "__int128" in dwarf/BTF
      and it doesn't exist in clang build kernel. Code searching for kernel code
      found the following:
        arch/s390/include/asm/types.h:  unsigned __int128 pair;
        crypto/ecc.c:   unsigned __int128 m = (unsigned __int128)left * right;
        include/linux/math64.h: return (u64)(((unsigned __int128)a * mul) >> shift);
        include/linux/math64.h: return (u64)(((unsigned __int128)a * mul) >> shift);
        lib/ubsan.h:typedef __int128 s_max;
        lib/ubsan.h:typedef unsigned __int128 u_max;
      
      In my case, CONFIG_UBSAN is not enabled. Even if we only have "unsigned __int128"
      in the code, somehow gcc still put "__int128" in dwarf while clang didn't.
      Hence current test works fine for gcc but not for clang.
      
      Enabling CONFIG_UBSAN is an option to provide __int128 type into dwarf
      reliably for both gcc and clang, but not everybody enables CONFIG_UBSAN
      in their kernel build. So the best choice is to use "unsigned __int128" type
      which is available in both clang and gcc build kernels. But clang and gcc
      dwarf encoded names for "unsigned __int128" are different:
      
        [$ ~] cat t.c
        unsigned __int128 a;
        [$ ~] gcc -g -c t.c && llvm-dwarfdump t.o | grep __int128
                        DW_AT_type      (0x00000031 "__int128 unsigned")
                        DW_AT_name      ("__int128 unsigned")
        [$ ~] clang -g -c t.c && llvm-dwarfdump t.o | grep __int128
                        DW_AT_type      (0x00000033 "unsigned __int128")
                        DW_AT_name      ("unsigned __int128")
      
      The test change in this patch tries to test type name before
      doing actual test.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20210924025856.2192476-1-yhs@fb.com
      091037fb
  4. Sep 23, 2021
  5. Sep 22, 2021
  6. Sep 21, 2021
  7. Sep 18, 2021