Skip to content
  1. Jan 28, 2022
    • Yonghong Song's avatar
      selftests/bpf: add a selftest with __user tag · 696c3901
      Yonghong Song authored
      
      
      Added a selftest with three__user usages: a __user pointer-type argument
      in bpf_testmod, a __user pointer-type struct member in bpf_testmod,
      and a __user pointer-type struct member in vmlinux. In all cases,
      directly accessing the user memory will result verification failure.
      
        $ ./test_progs -v -n 22/3
        ...
        libbpf: prog 'test_user1': BPF program load failed: Permission denied
        libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG --
        R1 type=ctx expected=fp
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg)
        0: (79) r1 = *(u64 *)(r1 +0)
        func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 136561 type STRUCT 'bpf_testmod_btf_type_tag_1'
        1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,off=0,imm=0)
        ; g = arg->a;
        1: (61) r1 = *(u32 *)(r1 +0)
        R1 invalid mem access 'user_ptr_'
        ...
        #22/3 btf_tag/btf_type_tag_user_mod1:OK
      
        $ ./test_progs -v -n 22/4
        ...
        libbpf: prog 'test_user2': BPF program load failed: Permission denied
        libbpf: prog 'test_user2': -- BEGIN PROG LOAD LOG --
        R1 type=ctx expected=fp
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        ; int BPF_PROG(test_user2, struct bpf_testmod_btf_type_tag_2 *arg)
        0: (79) r1 = *(u64 *)(r1 +0)
        func 'bpf_testmod_test_btf_type_tag_user_2' arg0 has btf_id 136563 type STRUCT 'bpf_testmod_btf_type_tag_2'
        1: R1_w=ptr_bpf_testmod_btf_type_tag_2(id=0,off=0,imm=0)
        ; g = arg->p->a;
        1: (79) r1 = *(u64 *)(r1 +0)          ; R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,off=0,imm=0)
        ; g = arg->p->a;
        2: (61) r1 = *(u32 *)(r1 +0)
        R1 invalid mem access 'user_ptr_'
        ...
        #22/4 btf_tag/btf_type_tag_user_mod2:OK
      
        $ ./test_progs -v -n 22/5
        ...
        libbpf: prog 'test_sys_getsockname': BPF program load failed: Permission denied
        libbpf: prog 'test_sys_getsockname': -- BEGIN PROG LOAD LOG --
        R1 type=ctx expected=fp
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        ; int BPF_PROG(test_sys_getsockname, int fd, struct sockaddr *usockaddr,
        0: (79) r1 = *(u64 *)(r1 +8)
        func '__sys_getsockname' arg1 has btf_id 2319 type STRUCT 'sockaddr'
        1: R1_w=user_ptr_sockaddr(id=0,off=0,imm=0)
        ; g = usockaddr->sa_family;
        1: (69) r1 = *(u16 *)(r1 +0)
        R1 invalid mem access 'user_ptr_'
        ...
        #22/5 btf_tag/btf_type_tag_user_vmlinux:OK
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20220127154616.659314-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      696c3901
    • Yonghong Song's avatar
      selftests/bpf: rename btf_decl_tag.c to test_btf_decl_tag.c · 571d01a9
      Yonghong Song authored
      
      
      The uapi btf.h contains the following declaration:
        struct btf_decl_tag {
             __s32   component_idx;
        };
      
      The skeleton will also generate a struct with name
      "btf_decl_tag" for bpf program btf_decl_tag.c.
      
      Rename btf_decl_tag.c to test_btf_decl_tag.c so
      the corresponding skeleton struct name becomes
      "test_btf_decl_tag". This way, we could include
      uapi btf.h in prog_tests/btf_tag.c.
      There is no functionality change for this patch.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20220127154611.656699-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      571d01a9
    • Yonghong Song's avatar
      bpf: reject program if a __user tagged memory accessed in kernel way · c6f1bfe8
      Yonghong Song authored
      
      
      BPF verifier supports direct memory access for BPF_PROG_TYPE_TRACING type
      of bpf programs, e.g., a->b. If "a" is a pointer
      pointing to kernel memory, bpf verifier will allow user to write
      code in C like a->b and the verifier will translate it to a kernel
      load properly. If "a" is a pointer to user memory, it is expected
      that bpf developer should be bpf_probe_read_user() helper to
      get the value a->b. Without utilizing BTF __user tagging information,
      current verifier will assume that a->b is a kernel memory access
      and this may generate incorrect result.
      
      Now BTF contains __user information, it can check whether the
      pointer points to a user memory or not. If it is, the verifier
      can reject the program and force users to use bpf_probe_read_user()
      helper explicitly.
      
      In the future, we can easily extend btf_add_space for other
      address space tagging, for example, rcu/percpu etc.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20220127154606.654961-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c6f1bfe8
    • Yonghong Song's avatar
      compiler_types: define __user as __attribute__((btf_type_tag("user"))) · 7472d5a6
      Yonghong Song authored
      
      
      The __user attribute is currently mainly used by sparse for type checking.
      The attribute indicates whether a memory access is in user memory address
      space or not. Such information is important during tracing kernel
      internal functions or data structures as accessing user memory often
      has different mechanisms compared to accessing kernel memory. For example,
      the perf-probe needs explicit command line specification to indicate a
      particular argument or string in user-space memory ([1], [2], [3]).
      Currently, vmlinux BTF is available in kernel with many distributions.
      If __user attribute information is available in vmlinux BTF, the explicit
      user memory access information from users will not be necessary as
      the kernel can figure it out by itself with vmlinux BTF.
      
      Besides the above possible use for perf/probe, another use case is
      for bpf verifier. Currently, for bpf BPF_PROG_TYPE_TRACING type of bpf
      programs, users can write direct code like
        p->m1->m2
      and "p" could be a function parameter. Without __user information in BTF,
      the verifier will assume p->m1 accessing kernel memory and will generate
      normal loads. Let us say "p" actually tagged with __user in the source
      code.  In such cases, p->m1 is actually accessing user memory and direct
      load is not right and may produce incorrect result. For such cases,
      bpf_probe_read_user() will be the correct way to read p->m1.
      
      To support encoding __user information in BTF, a new attribute
        __attribute__((btf_type_tag("<arbitrary_string>")))
      is implemented in clang ([4]). For example, if we have
        #define __user __attribute__((btf_type_tag("user")))
      during kernel compilation, the attribute "user" information will
      be preserved in dwarf. After pahole converting dwarf to BTF, __user
      information will be available in vmlinux BTF.
      
      The following is an example with latest upstream clang (clang14) and
      pahole 1.23:
      
        [$ ~] cat test.c
        #define __user __attribute__((btf_type_tag("user")))
        int foo(int __user *arg) {
                return *arg;
        }
        [$ ~] clang -O2 -g -c test.c
        [$ ~] pahole -JV test.o
        ...
        [1] INT int size=4 nr_bits=32 encoding=SIGNED
        [2] TYPE_TAG user type_id=1
        [3] PTR (anon) type_id=2
        [4] FUNC_PROTO (anon) return=1 args=(3 arg)
        [5] FUNC foo type_id=4
        [$ ~]
      
      You can see for the function argument "int __user *arg", its type is
      described as
        PTR -> TYPE_TAG(user) -> INT
      The kernel can use this information for bpf verification or other
      use cases.
      
      Current btf_type_tag is only supported in clang (>= clang14) and
      pahole (>= 1.23). gcc support is also proposed and under development ([5]).
      
        [1] http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2
        [2] http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2
        [3] http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2
        [4] https://reviews.llvm.org/D111199
        [5] https://lore.kernel.org/bpf/0cbeb2fb-1a18-f690-e360-24b1c90c2a91@fb.com/
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20220127154600.652613-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7472d5a6
    • Pavel Begunkov's avatar
      cgroup/bpf: fast path skb BPF filtering · 46531a30
      Pavel Begunkov authored
      
      
      Even though there is a static key protecting from overhead from
      cgroup-bpf skb filtering when there is nothing attached, in many cases
      it's not enough as registering a filter for one type will ruin the fast
      path for all others. It's observed in production servers I've looked
      at but also in laptops, where registration is done during init by
      systemd or something else.
      
      Add a per-socket fast path check guarding from such overhead. This
      affects both receive and transmit paths of TCP, UDP and other
      protocols. It showed ~1% tx/s improvement in small payload UDP
      send benchmarks using a real NIC and in a server environment and the
      number jumps to 2-3% for preemtible kernels.
      
      Reviewed-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/r/d8c58857113185a764927a46f4b5a058d36d3ec3.1643292455.git.asml.silence@gmail.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      46531a30
    • Yonghong Song's avatar
      selftests/bpf: fix a clang compilation error · cdb5ed97
      Yonghong Song authored
      
      
      When building selftests/bpf with clang
        make -j LLVM=1
        make -C tools/testing/selftests/bpf -j LLVM=1
      I hit the following compilation error:
      
        trace_helpers.c:152:9: error: variable 'found' is used uninitialized whenever 'while' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized]
                while (fscanf(f, "%zx-%zx %s %zx %*[^\n]\n", &start, &end, buf, &base) == 4) {
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        trace_helpers.c:161:7: note: uninitialized use occurs here
                if (!found)
                     ^~~~~
        trace_helpers.c:152:9: note: remove the condition if it is always true
                while (fscanf(f, "%zx-%zx %s %zx %*[^\n]\n", &start, &end, buf, &base) == 4) {
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                       1
        trace_helpers.c:145:12: note: initialize the variable 'found' to silence this warning
                bool found;
                          ^
                           = false
      
      It is possible that for sane /proc/self/maps we may never hit the above issue
      in practice. But let us initialize variable 'found' properly to silence the
      compilation error.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20220127163726.1442032-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cdb5ed97
    • Magnus Karlsson's avatar
      selftests, xsk: Fix bpf_res cleanup test · 3b22523b
      Magnus Karlsson authored
      After commit 710ad98c
      
       ("veth: Do not record rx queue hint in veth_xmit"),
      veth no longer receives traffic on the same queue as it was sent on. This
      breaks the bpf_res test for the AF_XDP selftests as the socket tied to
      queue 1 will not receive traffic anymore.
      
      Modify the test so that two sockets are tied to queue id 0 using a shared
      umem instead. When killing the first socket enter the second socket into
      the xskmap so that traffic will flow to it. This will still test that the
      resources are not cleaned up until after the second socket dies, without
      having to rely on veth supporting rx_queue hints.
      
      Reported-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220125082945.26179-1-magnus.karlsson@gmail.com
      3b22523b
    • Daniel Borkmann's avatar
      Merge branch 'xsk-batching' · 33372bc2
      Daniel Borkmann authored
      
      
      Maciej Fijalkowski says:
      
      ====================
      Unfortunately, similar scalability issues that were addressed for XDP
      processing in ice, exist for XDP in the zero-copy driver used by AF_XDP.
      Let's resolve them in mostly the same way as we did in [0] and utilize
      the Tx batching API from XSK buffer pool.
      
      Move the array of Tx descriptors that is used with batching approach to
      the XSK buffer pool. This means that future users of this API will not
      have to carry the array on their own side, they can simple refer to
      pool's tx_desc array.
      
      We also improve the Rx side where we extend ice_alloc_rx_buf_zc() to
      handle the ring wrap and bump Rx tail more frequently. By doing so,
      Rx side is adjusted to Tx and it was needed for l2fwd scenario.
      
      Here are the improvements of performance numbers that this set brings
      measured with xdpsock app in busy poll mode for 1 and 2 core modes.
      Both Tx and Rx rings were sized to 1k length and busy poll budget was
      256.
      
      ----------------------------------------------------------------
           |      txonly:      |      l2fwd      |      rxdrop
      ----------------------------------------------------------------
      1C   |       149%        |       14%       |        3%
      ----------------------------------------------------------------
      2C   |       134%        |       20%       |        5%
      ----------------------------------------------------------------
      
      Next step will be to introduce batching onto Rx side.
      
      v5:
      * collect acks
      * fix typos
      * correct comments showing cache line boundaries in ice_tx_ring struct
      v4 - address Alexandr's review:
      * new patch (2) for making sure ring size is pow(2) when attaching
        xsk socket
      * don't open code ALIGN_DOWN (patch 3)
      * resign from storing tx_thresh in ice_tx_ring (patch 4)
      * scope variables in a better way for Tx batching (patch 7)
      v3:
      * drop likely() that was wrapping napi_complete_done (patch 1)
      * introduce configurable Tx threshold (patch 2)
      * handle ring wrap on Rx side when allocating buffers (patch 3)
      * respect NAPI budget when cleaning Tx descriptors in ZC (patch 6)
      v2:
      * introduce new patch that resets @next_dd and @next_rs fields
      * use batching API for AF_XDP Tx on ice side
      
        [0]: https://lore.kernel.org/bpf/20211015162908.145341-8-anthony.l.nguyen@intel.com/
      ====================
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      33372bc2
    • Maciej Fijalkowski's avatar
      ice: xsk: Borrow xdp_tx_active logic from i40e · 59e92bfe
      Maciej Fijalkowski authored
      One of the things that commit 5574ff7b
      
       ("i40e: optimize AF_XDP Tx
      completion path") introduced was the @xdp_tx_active field. Its usage
      from i40e can be adjusted to ice driver and give us positive performance
      results.
      
      If the descriptor that @next_dd points to has been sent by HW (its DD
      bit is set), then we are sure that at least quarter of the ring is ready
      to be cleaned. If @xdp_tx_active is 0 which means that related xdp_ring
      is not used for XDP_{TX, REDIRECT} workloads, then we know how many XSK
      entries should placed to completion queue, IOW walking through the ring
      can be skipped.
      
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220125160446.78976-9-maciej.fijalkowski@intel.com
      59e92bfe
    • Maciej Fijalkowski's avatar
      ice: xsk: Improve AF_XDP ZC Tx and use batching API · 126cdfe1
      Maciej Fijalkowski authored
      Apply the logic that was done for regular XDP from commit 9610bd98
      
      
      ("ice: optimize XDP_TX workloads") to the ZC side of the driver. On top
      of that, introduce batching to Tx that is inspired by i40e's
      implementation with adjustments to the cleaning logic - take into the
      account NAPI budget in ice_clean_xdp_irq_zc().
      
      Separating the stats structs onto separate cache lines seemed to improve
      the performance.
      
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Link: https://lore.kernel.org/bpf/20220125160446.78976-8-maciej.fijalkowski@intel.com
      126cdfe1
    • Maciej Fijalkowski's avatar
      ice: xsk: Avoid potential dead AF_XDP Tx processing · 86e3f78c
      Maciej Fijalkowski authored
      Commit 9610bd98
      
       ("ice: optimize XDP_TX workloads") introduced
      @next_dd and @next_rs to ice_tx_ring struct. Currently, their state is
      not restored in ice_clean_tx_ring(), which was not causing any troubles
      as the XDP rings are gone after we're done with XDP prog on interface.
      
      For upcoming usage of mentioned fields in AF_XDP, this might expose us
      to a potential dead Tx side. Scenario would look like following (based
      on xdpsock):
      
      - two xdpsock instances are spawned in Tx mode
      - one of them is killed
      - XDP prog is kept on interface due to the other xdpsock still running
        * this means that XDP rings stayed in place
      - xdpsock is launched again on same queue id that was terminated on
      - @next_dd and @next_rs setting is bogus, therefore transmit side is
        broken
      
      To protect us from the above, restore the initial @next_rs and @next_dd
      values when cleaning the Tx ring.
      
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220125160446.78976-7-maciej.fijalkowski@intel.com
      86e3f78c
    • Magnus Karlsson's avatar
      i40e: xsk: Move tmp desc array from driver to pool · d1bc532e
      Magnus Karlsson authored
      
      
      Move desc_array from the driver to the pool. The reason behind this is
      that we can then reuse this array as a temporary storage for descriptors
      in all zero-copy drivers that use the batched interface. This will make
      it easier to add batching to more drivers.
      
      i40e is the only driver that has a batched Tx zero-copy
      implementation, so no need to touch any other driver.
      
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Link: https://lore.kernel.org/bpf/20220125160446.78976-6-maciej.fijalkowski@intel.com
      d1bc532e
    • Maciej Fijalkowski's avatar
      ice: Make Tx threshold dependent on ring length · 3dd411ef
      Maciej Fijalkowski authored
      
      
      XDP_TX workloads use a concept of Tx threshold that indicates the
      interval of setting RS bit on descriptors which in turn tells the HW to
      generate an interrupt to signal the completion of Tx on HW side. It is
      currently based on a constant value of 32 which might not work out well
      for various sizes of ring combined with for example batch size that can
      be set via SO_BUSY_POLL_BUDGET.
      
      Internal tests based on AF_XDP showed that most convenient setup of
      mentioned threshold is when it is equal to quarter of a ring length.
      
      Make use of recently introduced ICE_RING_QUARTER macro and use this
      value as a substitute for ICE_TX_THRESH.
      
      Align also ethtool -G callback so that next_dd/next_rs fields are up to
      date in terms of the ring size.
      
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220125160446.78976-5-maciej.fijalkowski@intel.com
      3dd411ef
    • Maciej Fijalkowski's avatar
      ice: xsk: Handle SW XDP ring wrap and bump tail more often · 3876ff52
      Maciej Fijalkowski authored
      
      
      Currently, if ice_clean_rx_irq_zc() processed the whole ring and
      next_to_use != 0, then ice_alloc_rx_buf_zc() would not refill the whole
      ring even if the XSK buffer pool would have enough free entries (either
      from fill ring or the internal recycle mechanism) - it is because ring
      wrap is not handled.
      
      Improve the logic in ice_alloc_rx_buf_zc() to address the problem above.
      Do not clamp the count of buffers that is passed to
      xsk_buff_alloc_batch() in case when next_to_use + buffer count >=
      rx_ring->count,  but rather split it and have two calls to the mentioned
      function - one for the part up until the wrap and one for the part after
      the wrap.
      
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Link: https://lore.kernel.org/bpf/20220125160446.78976-4-maciej.fijalkowski@intel.com
      3876ff52
    • Maciej Fijalkowski's avatar
      ice: xsk: Force rings to be sized to power of 2 · 296f13ff
      Maciej Fijalkowski authored
      
      
      With the upcoming introduction of batching to XSK data path,
      performance wise it will be the best to have the ring descriptor count
      to be aligned to power of 2.
      
      Check if ring sizes that user is going to attach the XSK socket fulfill
      the condition above. For Tx side, although check is being done against
      the Tx queue and in the end the socket will be attached to the XDP
      queue, it is fine since XDP queues get the ring->count setting from Tx
      queues.
      
      Suggested-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220125160446.78976-3-maciej.fijalkowski@intel.com
      296f13ff
    • Maciej Fijalkowski's avatar
      ice: Remove likely for napi_complete_done · a4e18669
      Maciej Fijalkowski authored
      
      
      Remove the likely before napi_complete_done as this is the unlikely case
      when busy-poll is used. Removing this has a positive performance impact
      for busy-poll and no negative impact to the regular case.
      
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220125160446.78976-2-maciej.fijalkowski@intel.com
      a4e18669
  2. Jan 27, 2022
    • Jakub Kicinski's avatar
      bpf: remove unused static inlines · 8033c6c2
      Jakub Kicinski authored
      
      
      Remove two dead stubs, sk_msg_clear_meta() was never
      used, use of xskq_cons_is_full() got replaced by
      xsk_tx_writeable() in v5.10.
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/20220126185412.2776254-1-kuba@kernel.org
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8033c6c2
    • Andrii Nakryiko's avatar
      selftests/bpf: fix uprobe offset calculation in selftests · ff943683
      Andrii Nakryiko authored
      
      
      Fix how selftests determine relative offset of a function that is
      uprobed. Previously, there was an assumption that uprobed function is
      always in the first executable region, which is not always the case
      (libbpf CI hits this case now). So get_base_addr() approach in isolation
      doesn't work anymore. So teach get_uprobe_offset() to determine correct
      memory mapping and calculate uprobe offset correctly.
      
      While at it, I merged together two implementations of
      get_uprobe_offset() helper, moving powerpc64-specific logic inside (had
      to add extra {} block to avoid unused variable error for insn).
      
      Also ensured that uprobed functions are never inlined, but are still
      static (and thus local to each selftest), by using a no-op asm volatile
      block internally. I didn't want to keep them global __weak, because some
      tests use uprobe's ref counter offset (to test USDT-like logic) which is
      not compatible with non-refcounted uprobe. So it's nicer to have each
      test uprobe target local to the file and guaranteed to not be inlined or
      skipped by the compiler (which can happen with static functions,
      especially if compiling selftests with -O2).
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20220126193058.3390292-1-andrii@kernel.org
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ff943683
    • Yonghong Song's avatar
      selftests/bpf: Fix a clang compilation error · e5465a90
      Yonghong Song authored
      
      
      Compiling kernel and selftests/bpf with latest llvm like blow:
        make -j LLVM=1
        make -C tools/testing/selftests/bpf -j LLVM=1
      I hit the following compilation error:
        /.../prog_tests/log_buf.c:215:6: error: variable 'log_buf' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
                if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data_good"))
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        /.../prog_tests/log_buf.c:264:7: note: uninitialized use occurs here
                free(log_buf);
                     ^~~~~~~
        /.../prog_tests/log_buf.c:215:2: note: remove the 'if' if its condition is always false
                if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data_good"))
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        /.../prog_tests/log_buf.c:205:15: note: initialize the variable 'log_buf' to silence this warning
                char *log_buf;
                             ^
                              = NULL
        1 error generated.
      
      Compiler rightfully detected that log_buf is uninitialized in one of failure path as indicated
      in the above.
      
      Proper initialization of 'log_buf' variable fixed the issue.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220126181940.4105997-1-yhs@fb.com
      e5465a90
  3. Jan 26, 2022
  4. Jan 25, 2022