Skip to content
  1. Jul 12, 2022
    • Xu Kuohai's avatar
      bpf, arm64: Implement bpf_arch_text_poke() for arm64 · b2ad54e1
      Xu Kuohai authored
      
      
      Implement bpf_arch_text_poke() for arm64, so bpf prog or bpf trampoline
      can be patched with it.
      
      When the target address is NULL, the original instruction is patched to
      a NOP.
      
      When the target address and the source address are within the branch
      range, the original instruction is patched to a bl instruction to the
      target address directly.
      
      To support attaching bpf trampoline to both regular kernel function and
      bpf prog, we follow the ftrace patchsite way for bpf prog. That is, two
      instructions are inserted at the beginning of bpf prog, the first one
      saves the return address to x9, and the second is a nop which will be
      patched to a bl instruction when a bpf trampoline is attached.
      
      However, when a bpf trampoline is attached to bpf prog, the distance
      between target address and source address may exceed 128MB, the maximum
      branch range, because bpf trampoline and bpf prog are allocated
      separately with vmalloc. So long jump should be handled.
      
      When a bpf prog is constructed, a plt pointing to empty trampoline
      dummy_tramp is placed at the end:
      
              bpf_prog:
                      mov x9, lr
                      nop // patchsite
                      ...
                      ret
      
              plt:
                      ldr x10, target
                      br x10
              target:
                      .quad dummy_tramp // plt target
      
      This is also the state when no trampoline is attached.
      
      When a short-jump bpf trampoline is attached, the patchsite is patched to
      a bl instruction to the trampoline directly:
      
              bpf_prog:
                      mov x9, lr
                      bl <short-jump bpf trampoline address> // patchsite
                      ...
                      ret
      
              plt:
                      ldr x10, target
                      br x10
              target:
                      .quad dummy_tramp // plt target
      
      When a long-jump bpf trampoline is attached, the plt target is filled with
      the trampoline address and the patchsite is patched to a bl instruction to
      the plt:
      
              bpf_prog:
                      mov x9, lr
                      bl plt // patchsite
                      ...
                      ret
      
              plt:
                      ldr x10, target
                      br x10
              target:
                      .quad <long-jump bpf trampoline address>
      
      dummy_tramp is used to prevent another CPU from jumping to an unknown
      location during the patching process, making the patching process easier.
      
      The patching process is as follows:
      
      1. when neither the old address or the new address is a long jump, the
         patchsite is replaced with a bl to the new address, or nop if the new
         address is NULL;
      
      2. when the old address is not long jump but the new one is, the
         branch target address is written to plt first, then the patchsite
         is replaced with a bl instruction to the plt;
      
      3. when the old address is long jump but the new one is not, the address
         of dummy_tramp is written to plt first, then the patchsite is replaced
         with a bl to the new address, or a nop if the new address is NULL;
      
      4. when both the old address and the new address are long jump, the
         new address is written to plt and the patchsite is not changed.
      
      Signed-off-by: default avatarXu Kuohai <xukuohai@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Reviewed-by: default avatarKP Singh <kpsingh@kernel.org>
      Reviewed-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20220711150823.2128542-4-xukuohai@huawei.com
      b2ad54e1
    • Xu Kuohai's avatar
      arm64: Add LDR (literal) instruction · f1e8a24e
      Xu Kuohai authored
      
      
      Add LDR (literal) instruction to load data from address relative to PC.
      This instruction will be used to implement long jump from bpf prog to
      bpf trampoline in the follow-up patch.
      
      The instruction encoding:
      
          3       2   2     2                                     0        0
          0       7   6     4                                     5        0
      +-----+-------+---+-----+-------------------------------------+--------+
      | 0 x | 0 1 1 | 0 | 0 0 |                imm19                |   Rt   |
      +-----+-------+---+-----+-------------------------------------+--------+
      
      for 32-bit, variant x == 0; for 64-bit, x == 1.
      
      branch_imm_common() is used to check the distance between pc and target
      address, since it's reused by this patch and LDR (literal) is not a branch
      instruction, rename it to label_imm_common().
      
      Signed-off-by: default avatarXu Kuohai <xukuohai@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/bpf/20220711150823.2128542-3-xukuohai@huawei.com
      f1e8a24e
    • Xu Kuohai's avatar
      bpf: Remove is_valid_bpf_tramp_flags() · 535a57a7
      Xu Kuohai authored
      
      
      Before generating bpf trampoline, x86 calls is_valid_bpf_tramp_flags()
      to check the input flags. This check is architecture independent.
      So, to be consistent with x86, arm64 should also do this check
      before generating bpf trampoline.
      
      However, the BPF_TRAMP_F_XXX flags are not used by user code and the
      flags argument is almost constant at compile time, so this run time
      check is a bit redundant.
      
      Remove is_valid_bpf_tramp_flags() and add some comments to the usage of
      BPF_TRAMP_F_XXX flags, as suggested by Alexei.
      
      Signed-off-by: default avatarXu Kuohai <xukuohai@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20220711150823.2128542-2-xukuohai@huawei.com
      535a57a7
    • Liu Jian's avatar
      skmsg: Fix invalid last sg check in sk_msg_recvmsg() · 9974d37e
      Liu Jian authored
      In sk_psock_skb_ingress_enqueue function, if the linear area + nr_frags +
      frag_list of the SKB has NR_MSG_FRAG_IDS blocks in total, skb_to_sgvec
      will return NR_MSG_FRAG_IDS, then msg->sg.end will be set to
      NR_MSG_FRAG_IDS, and in addition, (NR_MSG_FRAG_IDS - 1) is set to the last
      SG of msg. Recv the msg in sk_msg_recvmsg, when i is (NR_MSG_FRAG_IDS - 1),
      the sk_msg_iter_var_next(i) will change i to 0 (not NR_MSG_FRAG_IDS), the
      judgment condition "msg_rx->sg.start==msg_rx->sg.end" and
      "i != msg_rx->sg.end" can not work.
      
      As a result, the processed msg cannot be deleted from ingress_msg list.
      But the length of all the sge of the msg has changed to 0. Then the next
      recvmsg syscall will process the msg repeatedly, because the length of sge
      is 0, the -EFAULT error is always returned.
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220628123616.186950-1-liujian56@huawei.com
      9974d37e
  2. Jul 11, 2022
    • Jilin Yuan's avatar
      fddi/skfp: fix repeated words in comments · edb2c347
      Jilin Yuan authored
      
      
      Delete the redundant word 'test'.
      
      Signed-off-by: default avatarJilin Yuan <yuanjilin@cdjrlc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edb2c347
    • Jilin Yuan's avatar
      ethernet/via: fix repeated words in comments · 1377a5b2
      Jilin Yuan authored
      
      
      Delete the redundant word 'driver'.
      
      Signed-off-by: default avatarJilin Yuan <yuanjilin@cdjrlc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1377a5b2
    • sewookseo's avatar
      net: Find dst with sk's xfrm policy not ctl_sk · e22aa148
      sewookseo authored
      
      
      If we set XFRM security policy by calling setsockopt with option
      IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
      struct. However tcp_v6_send_response doesn't look up dst_entry with the
      actual socket but looks up with tcp control socket. This may cause a
      problem that a RST packet is sent without ESP encryption & peer's TCP
      socket can't receive it.
      This patch will make the function look up dest_entry with actual socket,
      if the socket has XFRM policy(sock_policy), so that the TCP response
      packet via this function can be encrypted, & aligned on the encrypted
      TCP socket.
      
      Tested: We encountered this problem when a TCP socket which is encrypted
      in ESP transport mode encryption, receives challenge ACK at SYN_SENT
      state. After receiving challenge ACK, TCP needs to send RST to
      establish the socket at next SYN try. But the RST was not encrypted &
      peer TCP socket still remains on ESTABLISHED state.
      So we verified this with test step as below.
      [Test step]
      1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
      2. Client tries a new connection on the same TCP ports(src & dst).
      3. Server will return challenge ACK instead of SYN,ACK.
      4. Client will send RST to server to clear the SOCKET.
      5. Client will retransmit SYN to server on the same TCP ports.
      [Expected result]
      The TCP connection should be established.
      
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Sehee Lee <seheele@google.com>
      Signed-off-by: default avatarSewook Seo <sewookseo@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e22aa148
  3. Jul 10, 2022
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 0076cad3
      Jakub Kicinski authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2022-07-09
      
      We've added 94 non-merge commits during the last 19 day(s) which contain
      a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).
      
      The main changes are:
      
      1) Add new way for performing BTF type queries to BPF, from Daniel Müller.
      
      2) Add inlining of calls to bpf_loop() helper when its function callback is
         statically known, from Eduard Zingerman.
      
      3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.
      
      4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
         hooks, from Stanislav Fomichev.
      
      5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.
      
      6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.
      
      7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
         selftests, from Magnus Karlsson & Maciej Fijalkowski.
      
      8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.
      
      9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.
      
      10) Sockmap optimizations around throughput of UDP transmissions which have been
          improved by 61%, from Cong Wang.
      
      11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.
      
      12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.
      
      13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.
      
      14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
          macro, from James Hilliard.
      
      15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.
      
      16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
        selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
        bpf: Correctly propagate errors up from bpf_core_composites_match
        libbpf: Disable SEC pragma macro on GCC
        bpf: Check attach_func_proto more carefully in check_return_code
        selftests/bpf: Add test involving restrict type qualifier
        bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
        MAINTAINERS: Add entry for AF_XDP selftests files
        selftests, xsk: Rename AF_XDP testing app
        bpf, docs: Remove deprecated xsk libbpf APIs description
        selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
        libbpf, riscv: Use a0 for RC register
        libbpf: Remove unnecessary usdt_rel_ip assignments
        selftests/bpf: Fix few more compiler warnings
        selftests/bpf: Fix bogus uninitialized variable warning
        bpftool: Remove zlib feature test from Makefile
        libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
        libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
        libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
        selftests/bpf: Add type match test against kernel's task_struct
        selftests/bpf: Add nested type to type based tests
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0076cad3
  4. Jul 09, 2022
  5. Jul 08, 2022