Skip to content
  1. Mar 18, 2022
  2. Mar 17, 2022
  3. Mar 16, 2022
    • Daniel Xu's avatar
      bpftool: man: Add missing top level docs · 6585abea
      Daniel Xu authored
      
      
      The top-level (bpftool.8) man page was missing docs for a few
      subcommands and their respective sub-sub-commands.
      
      This commit brings the top level man page up to date. Note that I've
      kept the ordering of the subcommands the same as in `bpftool help`.
      
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/3049ef5dc509c0d1832f0a8b2dba2ccaad0af688.1647213551.git.dxu@dxuuu.xyz
      6585abea
    • Dmitrii Dolgov's avatar
      bpftool: Add bpf_cookie to link output · cbdaf71f
      Dmitrii Dolgov authored
      Commit 82e6b1ee
      
       ("bpf: Allow to specify user-provided bpf_cookie for
      BPF perf links") introduced the concept of user specified bpf_cookie,
      which could be accessed by BPF programs using bpf_get_attach_cookie().
      For troubleshooting purposes it is convenient to expose bpf_cookie via
      bpftool as well, so there is no need to meddle with the target BPF
      program itself.
      
      Implemented using the pid iterator BPF program to actually fetch
      bpf_cookies, which allows constraining code changes only to bpftool.
      
      $ bpftool link
      1: type 7  prog 5
              bpf_cookie 123
              pids bootstrap(81)
      
      Signed-off-by: default avatarDmitrii Dolgov <9erthalion6@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20220309163112.24141-1-9erthalion6@gmail.com
      cbdaf71f
    • Guo Zhengkui's avatar
      selftests/bpf: Clean up array_size.cocci warnings · f98d6dd1
      Guo Zhengkui authored
      
      
      Clean up the array_size.cocci warnings under tools/testing/selftests/bpf/:
      
      Use `ARRAY_SIZE(arr)` instead of forms like `sizeof(arr)/sizeof(arr[0])`.
      
      tools/testing/selftests/bpf/test_cgroup_storage.c uses ARRAY_SIZE() defined
      in tools/include/linux/kernel.h (sys/sysinfo.h -> linux/kernel.h), while
      others use ARRAY_SIZE() in bpf_util.h.
      
      Signed-off-by: default avatarGuo Zhengkui <guozhengkui@vivo.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220315130143.2403-1-guozhengkui@vivo.com
      f98d6dd1
  4. Mar 15, 2022
    • Niklas Söderlund's avatar
      samples/bpf, xdpsock: Fix race when running for fix duration of time · 8fa42d78
      Niklas Söderlund authored
      When running xdpsock for a fix duration of time before terminating
      using --duration=<n>, there is a race condition that may cause xdpsock
      to terminate immediately.
      
      When running for a fixed duration of time the check to determine when to
      terminate execution is in is_benchmark_done() and is being executed in
      the context of the poller thread,
      
          if (opt_duration > 0) {
                  unsigned long dt = (get_nsecs() - start_time);
      
                  if (dt >= opt_duration)
                          benchmark_done = true;
          }
      
      However start_time is only set after the poller thread have been
      created. This leaves a small window when the poller thread is starting
      and calls is_benchmark_done() for the first time that start_time is not
      yet set. In that case start_time have its initial value of 0 and the
      duration check fails as it do not correlate correctly for the
      applications start time and immediately sets benchmark_done which in
      turn terminates the xdpsock application.
      
      Fix this by setting start_time before creating the poller thread.
      
      Fixes: d3f11b01
      
       ("samples/bpf: xdpsock: Add duration option to specify how long to run")
      Signed-off-by: default avatarNiklas Söderlund <niklas.soderlund@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220315102948.466436-1-niklas.soderlund@corigine.com
      8fa42d78
    • Wang Yufen's avatar
      bpf, sockmap: Fix double uncharge the mem of sk_msg · 2486ab43
      Wang Yufen authored
      If tcp_bpf_sendmsg is running during a tear down operation, psock may be
      freed.
      
      tcp_bpf_sendmsg()
       tcp_bpf_send_verdict()
        sk_msg_return()
        tcp_bpf_sendmsg_redir()
         unlikely(!psock))
           sk_msg_free()
      
      The mem of msg has been uncharged in tcp_bpf_send_verdict() by
      sk_msg_return(), and would be uncharged by sk_msg_free() again. When psock
      is null, we can simply returning an error code, this would then trigger
      the sk_msg_free_nocharge in the error path of __SK_REDIRECT and would have
      the side effect of throwing an error up to user space. This would be a
      slight change in behavior from user side but would look the same as an
      error if the redirect on the socket threw an error.
      
      This issue can cause the following info:
      WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
      Call Trace:
       <TASK>
       __sk_destruct+0x24/0x1f0
       sk_psock_destroy+0x19b/0x1c0
       process_one_work+0x1b3/0x3c0
       worker_thread+0x30/0x350
       ? process_one_work+0x3c0/0x3c0
       kthread+0xe6/0x110
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x22/0x30
       </TASK>
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarWang Yufen <wangyufen@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220304081145.2037182-5-wangyufen@huawei.com
      2486ab43
    • Wang Yufen's avatar
      bpf, sockmap: Fix more uncharged while msg has more_data · 84472b43
      Wang Yufen authored
      In tcp_bpf_send_verdict(), if msg has more data after
      tcp_bpf_sendmsg_redir():
      
      tcp_bpf_send_verdict()
       tosend = msg->sg.size  //msg->sg.size = 22220
       case __SK_REDIRECT:
        sk_msg_return()  //uncharged msg->sg.size(22220) sk->sk_forward_alloc
        tcp_bpf_sendmsg_redir() //after tcp_bpf_sendmsg_redir, msg->sg.size=11000
       goto more_data;
       tosend = msg->sg.size  //msg->sg.size = 11000
       case __SK_REDIRECT:
        sk_msg_return()  //uncharged msg->sg.size(11000) to sk->sk_forward_alloc
      
      The msg->sg.size(11000) has been uncharged twice, to fix we can charge the
      remaining msg->sg.size before goto more data.
      
      This issue can cause the following info:
      WARNING: CPU: 0 PID: 9860 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
      Call Trace:
       <TASK>
       inet_csk_destroy_sock+0x55/0x110
       __tcp_close+0x279/0x470
       tcp_close+0x1f/0x60
       inet_release+0x3f/0x80
       __sock_release+0x3d/0xb0
       sock_close+0x11/0x20
       __fput+0x92/0x250
       task_work_run+0x6a/0xa0
       do_exit+0x33b/0xb60
       do_group_exit+0x2f/0xa0
       get_signal+0xb6/0x950
       arch_do_signal_or_restart+0xac/0x2a0
       ? vfs_write+0x237/0x290
       exit_to_user_mode_prepare+0xa9/0x200
       syscall_exit_to_user_mode+0x12/0x30
       do_syscall_64+0x46/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
       </TASK>
      
      WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
      Call Trace:
       <TASK>
       __sk_destruct+0x24/0x1f0
       sk_psock_destroy+0x19b/0x1c0
       process_one_work+0x1b3/0x3c0
       worker_thread+0x30/0x350
       ? process_one_work+0x3c0/0x3c0
       kthread+0xe6/0x110
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x22/0x30
       </TASK>
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarWang Yufen <wangyufen@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220304081145.2037182-4-wangyufen@huawei.com
      84472b43
    • Wang Yufen's avatar
      bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full · 9c34e38c
      Wang Yufen authored
      If tcp_bpf_sendmsg() is running while sk msg is full. When sk_msg_alloc()
      returns -ENOMEM error, tcp_bpf_sendmsg() goes to wait_for_memory. If partial
      memory has been alloced by sk_msg_alloc(), that is, msg_tx->sg.size is
      greater than osize after sk_msg_alloc(), memleak occurs. To fix we use
      sk_msg_trim() to release the allocated memory, then goto wait for memory.
      
      Other call paths of sk_msg_alloc() have the similar issue, such as
      tls_sw_sendmsg(), so handle sk_msg_trim logic inside sk_msg_alloc(),
      as Cong Wang suggested.
      
      This issue can cause the following info:
      WARNING: CPU: 3 PID: 7950 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
      Call Trace:
       <TASK>
       inet_csk_destroy_sock+0x55/0x110
       __tcp_close+0x279/0x470
       tcp_close+0x1f/0x60
       inet_release+0x3f/0x80
       __sock_release+0x3d/0xb0
       sock_close+0x11/0x20
       __fput+0x92/0x250
       task_work_run+0x6a/0xa0
       do_exit+0x33b/0xb60
       do_group_exit+0x2f/0xa0
       get_signal+0xb6/0x950
       arch_do_signal_or_restart+0xac/0x2a0
       exit_to_user_mode_prepare+0xa9/0x200
       syscall_exit_to_user_mode+0x12/0x30
       do_syscall_64+0x46/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
       </TASK>
      
      WARNING: CPU: 3 PID: 2094 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
      Call Trace:
       <TASK>
       __sk_destruct+0x24/0x1f0
       sk_psock_destroy+0x19b/0x1c0
       process_one_work+0x1b3/0x3c0
       kthread+0xe6/0x110
       ret_from_fork+0x22/0x30
       </TASK>
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarWang Yufen <wangyufen@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220304081145.2037182-3-wangyufen@huawei.com
      9c34e38c
    • Wang Yufen's avatar
      bpf, sockmap: Fix memleak in sk_psock_queue_msg · 938d3480
      Wang Yufen authored
      If tcp_bpf_sendmsg is running during a tear down operation we may enqueue
      data on the ingress msg queue while tear down is trying to free it.
      
       sk1 (redirect sk2)                         sk2
       -------------------                      ---------------
      tcp_bpf_sendmsg()
       tcp_bpf_send_verdict()
        tcp_bpf_sendmsg_redir()
         bpf_tcp_ingress()
                                                sock_map_close()
                                                 lock_sock()
          lock_sock() ... blocking
                                                 sk_psock_stop
                                                  sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
                                                 release_sock(sk);
          lock_sock()
          sk_mem_charge()
          get_page()
          sk_psock_queue_msg()
           sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED);
            drop_sk_msg()
          release_sock()
      
      While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge
      and has sg pages need to put. To fix we use sk_msg_free() and then kfee()
      msg.
      
      This issue can cause the following info:
      WARNING: CPU: 0 PID: 9202 at net/core/stream.c:205 sk_stream_kill_queues+0xc8/0xe0
      Call Trace:
       <IRQ>
       inet_csk_destroy_sock+0x55/0x110
       tcp_rcv_state_process+0xe5f/0xe90
       ? sk_filter_trim_cap+0x10d/0x230
       ? tcp_v4_do_rcv+0x161/0x250
       tcp_v4_do_rcv+0x161/0x250
       tcp_v4_rcv+0xc3a/0xce0
       ip_protocol_deliver_rcu+0x3d/0x230
       ip_local_deliver_finish+0x54/0x60
       ip_local_deliver+0xfd/0x110
       ? ip_protocol_deliver_rcu+0x230/0x230
       ip_rcv+0xd6/0x100
       ? ip_local_deliver+0x110/0x110
       __netif_receive_skb_one_core+0x85/0xa0
       process_backlog+0xa4/0x160
       __napi_poll+0x29/0x1b0
       net_rx_action+0x287/0x300
       __do_softirq+0xff/0x2fc
       do_softirq+0x79/0x90
       </IRQ>
      
      WARNING: CPU: 0 PID: 531 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x175/0x1b0
      Call Trace:
       <TASK>
       __sk_destruct+0x24/0x1f0
       sk_psock_destroy+0x19b/0x1c0
       process_one_work+0x1b3/0x3c0
       ? process_one_work+0x3c0/0x3c0
       worker_thread+0x30/0x350
       ? process_one_work+0x3c0/0x3c0
       kthread+0xe6/0x110
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x22/0x30
       </TASK>
      
      Fixes: 9635720b
      
       ("bpf, sockmap: Fix memleak on ingress msg enqueue")
      Signed-off-by: default avatarWang Yufen <wangyufen@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220304081145.2037182-2-wangyufen@huawei.com
      938d3480
  5. Mar 12, 2022
    • Yonghong Song's avatar
      selftests/bpf: Fix a clang compilation error for send_signal.c · d3b351f6
      Yonghong Song authored
      
      
      Building selftests/bpf with latest clang compiler (clang15 built
      from source), I hit the following compilation error:
      
        /.../prog_tests/send_signal.c:43:16: error: variable 'j' set but not used [-Werror,-Wunused-but-set-variable]
                        volatile int j = 0;
                                     ^
        1 error generated.
      
      The problem also exists with clang13 and clang14. clang12 is okay.
      
      In send_signal.c, we have the following code ...
      
        volatile int j = 0;
        [...]
        for (int i = 0; i < 100000000 && !sigusr1_received; i++)
          j /= i + 1;
      
      ... to burn CPU cycles so bpf_send_signal() helper can be tested
      in NMI mode.
      
      Slightly changing 'j /= i + 1' to 'j /= i + j + 1' or 'j++' can
      fix the problem. Further investigation indicated this should be
      a clang bug ([1]). The upstream fix will be proposed later. But it
      is a good idea to workaround the issue to unblock people who build
      kernel/selftests with clang.
      
        [1] https://discourse.llvm.org/t/strange-clang-unused-but-set-variable-error-with-volatile-variables/60841
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220311003721.2177170-1-yhs@fb.com
      d3b351f6
    • Toke Høiland-Jørgensen's avatar
      selftests/bpf: Add a test for maximum packet size in xdp_do_redirect · c09df4bd
      Toke Høiland-Jørgensen authored
      
      
      This adds an extra test to the xdp_do_redirect selftest for XDP live packet
      mode, which verifies that the maximum permissible packet size is accepted
      without any errors, and that a too big packet is correctly rejected.
      
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220310225621.53374-2-toke@redhat.com
      c09df4bd
    • Toke Høiland-Jørgensen's avatar
      bpf, test_run: Fix packet size check for live packet mode · b6f1f780
      Toke Høiland-Jørgensen authored
      The live packet mode uses some extra space at the start of each page to
      cache data structures so they don't have to be rebuilt at every repetition.
      This space wasn't correctly accounted for in the size checking of the
      arguments supplied to userspace. In addition, the definition of the frame
      size should include the size of the skb_shared_info (as there is other
      logic that subtracts the size of this).
      
      Together, these mistakes resulted in userspace being able to trip the
      XDP_WARN() in xdp_update_frame_from_buff(), which syzbot discovered in
      short order. Fix this by changing the frame size define and adding the
      extra headroom to the bpf_prog_test_run_xdp() function. Also drop the
      max_len parameter to the page_pool init, since this is related to DMA which
      is not used for the page pool instance in PROG_TEST_RUN.
      
      Fixes: b530e9e1
      
       ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
      Reported-by: default avatar <syzbot+0e91362d99386dc5de99@syzkaller.appspotmail.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220310225621.53374-1-toke@redhat.com
      b6f1f780
  6. Mar 11, 2022