Skip to content
  1. Dec 07, 2022
  2. Dec 06, 2022
  3. Dec 05, 2022
    • James Hilliard's avatar
      selftests/bpf: Fix conflicts with built-in functions in bpf_iter_ksym · ab0350c7
      James Hilliard authored
      
      
      Both tolower and toupper are built in c functions, we should not
      redefine them as this can result in a build error.
      
      Fixes the following errors:
      progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
         10 | static inline char tolower(char c)
            |                    ^~~~~~~
      progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>'
          4 | #include <bpf/bpf_helpers.h>
        +++ |+#include <ctype.h>
          5 |
      progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
         17 | static inline char toupper(char c)
            |                    ^~~~~~~
      progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>'
      
      See background on this sort of issue:
      https://stackoverflow.com/a/20582607
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12213
      
      (C99, 7.1.3p1) "All identifiers with external linkage in any of the
      following subclauses (including the future library directions) are
      always reserved for use as identifiers with external linkage."
      
      This is documented behavior in GCC:
      https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-std-2
      
      Signed-off-by: default avatarJames Hilliard <james.hilliard1@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221203010847.2191265-1-james.hilliard1@gmail.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ab0350c7
    • Eric Dumazet's avatar
      bpf, sockmap: fix race in sock_map_free() · 0a182f8d
      Eric Dumazet authored
      sock_map_free() calls release_sock(sk) without owning a reference
      on the socket. This can cause use-after-free as syzbot found [1]
      
      Jakub Sitnicki already took care of a similar issue
      in sock_hash_free() in commit 75e68e5b ("bpf, sockhash:
      Synchronize delete from bucket list on map free")
      
      [1]
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 0 PID: 3785 at lib/refcount.c:31 refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
      Modules linked in:
      CPU: 0 PID: 3785 Comm: kworker/u4:6 Not tainted 6.1.0-rc7-syzkaller-00103-gef4d3ea40565 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      Workqueue: events_unbound bpf_map_free_deferred
      RIP: 0010:refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
      Code: 68 8b 31 c0 e8 75 71 15 fd 0f 0b e9 64 ff ff ff e8 d9 6e 4e fd c6 05 62 9c 3d 0a 01 48 c7 c7 80 bb 68 8b 31 c0 e8 54 71 15 fd <0f> 0b e9 43 ff ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a2 fe ff
      RSP: 0018:ffffc9000456fb60 EFLAGS: 00010246
      RAX: eae59bab72dcd700 RBX: 0000000000000004 RCX: ffff8880207057c0
      RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
      RBP: 0000000000000004 R08: ffffffff816fdabd R09: fffff520008adee5
      R10: fffff520008adee5 R11: 1ffff920008adee4 R12: 0000000000000004
      R13: dffffc0000000000 R14: ffff88807b1c6c00 R15: 1ffff1100f638dcf
      FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b30c30000 CR3: 000000000d08e000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      __refcount_dec include/linux/refcount.h:344 [inline]
      refcount_dec include/linux/refcount.h:359 [inline]
      __sock_put include/net/sock.h:779 [inline]
      tcp_release_cb+0x2d0/0x360 net/ipv4/tcp_output.c:1092
      release_sock+0xaf/0x1c0 net/core/sock.c:3468
      sock_map_free+0x219/0x2c0 net/core/sock_map.c:356
      process_one_work+0x81c/0xd10 kernel/workqueue.c:2289
      worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
      kthread+0x266/0x300 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
      </TASK>
      
      Fixes: 7e81a353
      
       ("bpf: Sockmap, ensure sock lock held during tear down")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Song Liu <songliubraving@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20221202111640.2745533-1-edumazet@google.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0a182f8d
    • Toke Høiland-Jørgensen's avatar
      bpf: Add dummy type reference to nf_conn___init to fix type deduplication · 578ce69f
      Toke Høiland-Jørgensen authored
      The bpf_ct_set_nat_info() kfunc is defined in the nf_nat.ko module, and
      takes as a parameter the nf_conn___init struct, which is allocated through
      the bpf_xdp_ct_alloc() helper defined in the nf_conntrack.ko module.
      However, because kernel modules can't deduplicate BTF types between each
      other, and the nf_conn___init struct is not referenced anywhere in vmlinux
      BTF, this leads to two distinct BTF IDs for the same type (one in each
      module). This confuses the verifier, as described here:
      
      https://lore.kernel.org/all/87leoh372s.fsf@toke.dk/
      
      As a workaround, add an explicit BTF_TYPE_EMIT for the type in
      net/filter.c, so the type definition gets included in vmlinux BTF. This
      way, both modules can refer to the same type ID (as they both build on top
      of vmlinux BTF), and the verifier is no longer confused.
      
      v2:
      
      - Use BTF_TYPE_EMIT (which is a statement so it has to be inside a function
        definition; use xdp_func_proto() for this, since this is mostly
        xdp-related).
      
      Fixes: 820dc052
      
       ("net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20221201123939.696558-1-toke@redhat.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      578ce69f
    • Yonghong Song's avatar
      bpf: Add sleepable prog tests for cgrp local storage · 41d76c72
      Yonghong Song authored
      
      
      Add three tests for cgrp local storage support for sleepable progs.
      Two tests can load and run properly, one for cgroup_iter, another
      for passing current->cgroups->dfl_cgrp to bpf_cgrp_storage_get()
      helper. One test has bpf_rcu_read_lock() and failed to load.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221201050449.2785613-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      41d76c72
    • Yonghong Song's avatar
      bpf: Enable sleeptable support for cgrp local storage · 2c40d97d
      Yonghong Song authored
      
      
      Similar to sk/inode/task local storage, enable sleepable support for
      cgrp local storage.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221201050444.2785007-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2c40d97d
    • Sreevani Sreejith's avatar
      bpf, docs: BPF Iterator Document · 8972e18a
      Sreevani Sreejith authored
      
      
      Document that describes how BPF iterators work, how to use iterators,
      and how to pass parameters in BPF iterators.
      
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarSreevani Sreejith <psreep@gmail.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221202221710.320810-2-ssreevani@meta.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8972e18a
    • Yonghong Song's avatar
      bpf: Do not mark certain LSM hook arguments as trusted · c0c852dd
      Yonghong Song authored
      Martin mentioned that the verifier cannot assume arguments from
      LSM hook sk_alloc_security being trusted since after the hook
      is called, the sk ref_count is set to 1. This will overwrite
      the ref_count changed by the bpf program and may cause ref_count
      underflow later on.
      
      I then further checked some other hooks. For example,
      for bpf_lsm_file_alloc() hook in fs/file_table.c,
      
              f->f_cred = get_cred(cred);
              error = security_file_alloc(f);
              if (unlikely(error)) {
                      file_free_rcu(&f->f_rcuhead);
                      return ERR_PTR(error);
              }
      
              atomic_long_set(&f->f_count, 1);
      
      The input parameter 'f' to security_file_alloc() cannot be trusted
      as well.
      
      Specifically, I investiaged bpf_map/bpf_prog/file/sk/task alloc/free
      lsm hooks. Except bpf_map_alloc and task_alloc, arguments for all other
      hooks should not be considered as trusted. This may not be a complete
      list, but it covers common usage for sk and task.
      
      Fixes: 3f00c523
      
       ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221203204954.2043348-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c0c852dd
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Handle MEM_RCU type properly' · 1910676c
      Alexei Starovoitov authored
      
      
      Yonghong Song says:
      
      ====================
      
      Patch set [1] added rcu support for bpf programs. In [1], a rcu
      pointer is considered to be trusted and not null. This is actually
      not true in some cases. The rcu pointer could be null, and for non-null
      rcu pointer, it may have reference count of 0. This small patch set
      fixed this problem. Patch 1 is the kernel fix. Patch 2 adjusted
      selftests properly. Patch 3 added documentation for newly-introduced
      KF_RCU flag.
      
        [1] https://lore.kernel.org/all/20221124053201.2372298-1-yhs@fb.com/
      
      Changelogs:
        v1 -> v2:
          - rcu ptr could be NULL.
          - non_null_rcu_ptr->rcu_field can be marked as MEM_RCU as well.
          - Adjust the code to avoid existing error message change.
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1910676c
    • Yonghong Song's avatar
      docs/bpf: Add KF_RCU documentation · f5362564
      Yonghong Song authored
      
      
      Add proper KF_RCU documentation in kfuncs.rst.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221203184613.478967-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f5362564
    • Yonghong Song's avatar
      selftests/bpf: Fix rcu_read_lock test with new MEM_RCU semantics · 8723ec22
      Yonghong Song authored
      
      
      Add MEM_RCU pointer null checking for related tests. Also
      modified task_acquire test so it takes a rcu ptr 'ptr' where
      'ptr = rcu_ptr->rcu_field'.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221203184607.478314-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8723ec22
    • Yonghong Song's avatar
      bpf: Handle MEM_RCU type properly · fca1aa75
      Yonghong Song authored
      Commit 9bb00b28 ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
      introduced MEM_RCU and bpf_rcu_read_lock/unlock() support. In that
      commit, a rcu pointer is tagged with both MEM_RCU and PTR_TRUSTED
      so that it can be passed into kfuncs or helpers as an argument.
      
      Martin raised a good question in [1] such that the rcu pointer,
      although being able to accessing the object, might have reference
      count of 0. This might cause a problem if the rcu pointer is passed
      to a kfunc which expects trusted arguments where ref count should
      be greater than 0.
      
      This patch makes the following changes related to MEM_RCU pointer:
        - MEM_RCU pointer might be NULL (PTR_MAYBE_NULL).
        - Introduce KF_RCU so MEM_RCU ptr can be acquired with
          a KF_RCU tagged kfunc which assumes ref count of rcu ptr
          could be zero.
        - For mem access 'b = ptr->a', say 'ptr' is a MEM_RCU ptr, and
          'a' is tagged with __rcu as well. Let us mark 'b' as
          MEM_RCU | PTR_MAYBE_NULL.
      
       [1] https://lore.kernel.org/bpf/ac70f574-4023-664e-b711-e0d3b18117fd@linux.dev/
      
      Fixes: 9bb00b28
      
       ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221203184602.477272-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fca1aa75
  4. Dec 03, 2022
  5. Dec 02, 2022
    • Dave Marchevsky's avatar
      selftests/bpf: Validate multiple ref release_on_unlock logic · 78b037bd
      Dave Marchevsky authored
      
      
      Modify list_push_pop_multiple to alloc and insert nodes 2-at-a-time.
      Without the previous patch's fix, this block of code:
      
        bpf_spin_lock(lock);
        bpf_list_push_front(head, &f[i]->node);
        bpf_list_push_front(head, &f[i + 1]->node);
        bpf_spin_unlock(lock);
      
      would fail check_reference_leak check as release_on_unlock logic would miss
      a ref that should've been released.
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221201183406.1203621-2-davemarchevsky@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      78b037bd
    • Dave Marchevsky's avatar
      bpf: Fix release_on_unlock release logic for multiple refs · 1f82dffc
      Dave Marchevsky authored
      
      
      Consider a verifier state with three acquired references, all with
      release_on_unlock = true:
      
                  idx  0 1 2
        state->refs = [2 4 6]
      
      (with 2, 4, and 6 being the ref ids).
      
      When bpf_spin_unlock is called, process_spin_lock will loop through all
      acquired_refs and, for each ref, if it's release_on_unlock, calls
      release_reference on it. That function in turn calls
      release_reference_state, which removes the reference from state->refs by
      swapping the reference state with the last reference state in
      refs array and decrements acquired_refs count.
      
      process_spin_lock's loop logic, which is essentially:
      
        for (i = 0; i < state->acquired_refs; i++) {
          if (!state->refs[i].release_on_unlock)
            continue;
          release_reference(state->refs[i].id);
        }
      
      will fail to release release_on_unlock references which are swapped from
      the end. Running this logic on our example demonstrates:
      
        state->refs = [2 4 6] (start of idx=0 iter)
          release state->refs[0] by swapping w/ state->refs[2]
      
        state->refs = [6 4]   (start of idx=1)
          release state->refs[1], no need to swap as it's the last idx
      
        state->refs = [6]     (start of idx=2, loop terminates)
      
      ref_id 6 should have been removed but was skipped.
      
      Fix this by looping from back-to-front, which results in refs that are
      candidates for removal being swapped with refs which have already been
      examined and kept.
      
      If we modify our initial example such that ref 6 is replaced with ref 7,
      which is _not_ release_on_unlock, and loop from the back, we'd see:
      
        state->refs = [2 4 7] (start of idx=2)
      
        state->refs = [2 4 7] (start of idx=1)
      
        state->refs = [2 7]   (start of idx=0, refs 7 and 4 swapped)
      
        state->refs = [7]     (after idx=0, 7 and 2 swapped, loop terminates)
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Fixes: 534e86bc
      
       ("bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}")
      Link: https://lore.kernel.org/r/20221201183406.1203621-1-davemarchevsky@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1f82dffc
  6. Dec 01, 2022