Skip to content
  1. Dec 05, 2022
    • Toke Høiland-Jørgensen's avatar
      bpf: Add dummy type reference to nf_conn___init to fix type deduplication · 578ce69f
      Toke Høiland-Jørgensen authored
      The bpf_ct_set_nat_info() kfunc is defined in the nf_nat.ko module, and
      takes as a parameter the nf_conn___init struct, which is allocated through
      the bpf_xdp_ct_alloc() helper defined in the nf_conntrack.ko module.
      However, because kernel modules can't deduplicate BTF types between each
      other, and the nf_conn___init struct is not referenced anywhere in vmlinux
      BTF, this leads to two distinct BTF IDs for the same type (one in each
      module). This confuses the verifier, as described here:
      
      https://lore.kernel.org/all/87leoh372s.fsf@toke.dk/
      
      As a workaround, add an explicit BTF_TYPE_EMIT for the type in
      net/filter.c, so the type definition gets included in vmlinux BTF. This
      way, both modules can refer to the same type ID (as they both build on top
      of vmlinux BTF), and the verifier is no longer confused.
      
      v2:
      
      - Use BTF_TYPE_EMIT (which is a statement so it has to be inside a function
        definition; use xdp_func_proto() for this, since this is mostly
        xdp-related).
      
      Fixes: 820dc052
      
       ("net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20221201123939.696558-1-toke@redhat.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      578ce69f
    • Yonghong Song's avatar
      bpf: Add sleepable prog tests for cgrp local storage · 41d76c72
      Yonghong Song authored
      
      
      Add three tests for cgrp local storage support for sleepable progs.
      Two tests can load and run properly, one for cgroup_iter, another
      for passing current->cgroups->dfl_cgrp to bpf_cgrp_storage_get()
      helper. One test has bpf_rcu_read_lock() and failed to load.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221201050449.2785613-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      41d76c72
    • Yonghong Song's avatar
      bpf: Enable sleeptable support for cgrp local storage · 2c40d97d
      Yonghong Song authored
      
      
      Similar to sk/inode/task local storage, enable sleepable support for
      cgrp local storage.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221201050444.2785007-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2c40d97d
    • Sreevani Sreejith's avatar
      bpf, docs: BPF Iterator Document · 8972e18a
      Sreevani Sreejith authored
      
      
      Document that describes how BPF iterators work, how to use iterators,
      and how to pass parameters in BPF iterators.
      
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarSreevani Sreejith <psreep@gmail.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221202221710.320810-2-ssreevani@meta.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8972e18a
    • Yonghong Song's avatar
      bpf: Do not mark certain LSM hook arguments as trusted · c0c852dd
      Yonghong Song authored
      Martin mentioned that the verifier cannot assume arguments from
      LSM hook sk_alloc_security being trusted since after the hook
      is called, the sk ref_count is set to 1. This will overwrite
      the ref_count changed by the bpf program and may cause ref_count
      underflow later on.
      
      I then further checked some other hooks. For example,
      for bpf_lsm_file_alloc() hook in fs/file_table.c,
      
              f->f_cred = get_cred(cred);
              error = security_file_alloc(f);
              if (unlikely(error)) {
                      file_free_rcu(&f->f_rcuhead);
                      return ERR_PTR(error);
              }
      
              atomic_long_set(&f->f_count, 1);
      
      The input parameter 'f' to security_file_alloc() cannot be trusted
      as well.
      
      Specifically, I investiaged bpf_map/bpf_prog/file/sk/task alloc/free
      lsm hooks. Except bpf_map_alloc and task_alloc, arguments for all other
      hooks should not be considered as trusted. This may not be a complete
      list, but it covers common usage for sk and task.
      
      Fixes: 3f00c523
      
       ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221203204954.2043348-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c0c852dd
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Handle MEM_RCU type properly' · 1910676c
      Alexei Starovoitov authored
      
      
      Yonghong Song says:
      
      ====================
      
      Patch set [1] added rcu support for bpf programs. In [1], a rcu
      pointer is considered to be trusted and not null. This is actually
      not true in some cases. The rcu pointer could be null, and for non-null
      rcu pointer, it may have reference count of 0. This small patch set
      fixed this problem. Patch 1 is the kernel fix. Patch 2 adjusted
      selftests properly. Patch 3 added documentation for newly-introduced
      KF_RCU flag.
      
        [1] https://lore.kernel.org/all/20221124053201.2372298-1-yhs@fb.com/
      
      Changelogs:
        v1 -> v2:
          - rcu ptr could be NULL.
          - non_null_rcu_ptr->rcu_field can be marked as MEM_RCU as well.
          - Adjust the code to avoid existing error message change.
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1910676c
    • Yonghong Song's avatar
      docs/bpf: Add KF_RCU documentation · f5362564
      Yonghong Song authored
      
      
      Add proper KF_RCU documentation in kfuncs.rst.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221203184613.478967-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f5362564
    • Yonghong Song's avatar
      selftests/bpf: Fix rcu_read_lock test with new MEM_RCU semantics · 8723ec22
      Yonghong Song authored
      
      
      Add MEM_RCU pointer null checking for related tests. Also
      modified task_acquire test so it takes a rcu ptr 'ptr' where
      'ptr = rcu_ptr->rcu_field'.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221203184607.478314-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8723ec22
    • Yonghong Song's avatar
      bpf: Handle MEM_RCU type properly · fca1aa75
      Yonghong Song authored
      Commit 9bb00b28 ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
      introduced MEM_RCU and bpf_rcu_read_lock/unlock() support. In that
      commit, a rcu pointer is tagged with both MEM_RCU and PTR_TRUSTED
      so that it can be passed into kfuncs or helpers as an argument.
      
      Martin raised a good question in [1] such that the rcu pointer,
      although being able to accessing the object, might have reference
      count of 0. This might cause a problem if the rcu pointer is passed
      to a kfunc which expects trusted arguments where ref count should
      be greater than 0.
      
      This patch makes the following changes related to MEM_RCU pointer:
        - MEM_RCU pointer might be NULL (PTR_MAYBE_NULL).
        - Introduce KF_RCU so MEM_RCU ptr can be acquired with
          a KF_RCU tagged kfunc which assumes ref count of rcu ptr
          could be zero.
        - For mem access 'b = ptr->a', say 'ptr' is a MEM_RCU ptr, and
          'a' is tagged with __rcu as well. Let us mark 'b' as
          MEM_RCU | PTR_MAYBE_NULL.
      
       [1] https://lore.kernel.org/bpf/ac70f574-4023-664e-b711-e0d3b18117fd@linux.dev/
      
      Fixes: 9bb00b28
      
       ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221203184602.477272-1-yhs@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fca1aa75
  2. Dec 03, 2022
  3. Dec 02, 2022
    • Dave Marchevsky's avatar
      selftests/bpf: Validate multiple ref release_on_unlock logic · 78b037bd
      Dave Marchevsky authored
      
      
      Modify list_push_pop_multiple to alloc and insert nodes 2-at-a-time.
      Without the previous patch's fix, this block of code:
      
        bpf_spin_lock(lock);
        bpf_list_push_front(head, &f[i]->node);
        bpf_list_push_front(head, &f[i + 1]->node);
        bpf_spin_unlock(lock);
      
      would fail check_reference_leak check as release_on_unlock logic would miss
      a ref that should've been released.
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20221201183406.1203621-2-davemarchevsky@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      78b037bd
    • Dave Marchevsky's avatar
      bpf: Fix release_on_unlock release logic for multiple refs · 1f82dffc
      Dave Marchevsky authored
      
      
      Consider a verifier state with three acquired references, all with
      release_on_unlock = true:
      
                  idx  0 1 2
        state->refs = [2 4 6]
      
      (with 2, 4, and 6 being the ref ids).
      
      When bpf_spin_unlock is called, process_spin_lock will loop through all
      acquired_refs and, for each ref, if it's release_on_unlock, calls
      release_reference on it. That function in turn calls
      release_reference_state, which removes the reference from state->refs by
      swapping the reference state with the last reference state in
      refs array and decrements acquired_refs count.
      
      process_spin_lock's loop logic, which is essentially:
      
        for (i = 0; i < state->acquired_refs; i++) {
          if (!state->refs[i].release_on_unlock)
            continue;
          release_reference(state->refs[i].id);
        }
      
      will fail to release release_on_unlock references which are swapped from
      the end. Running this logic on our example demonstrates:
      
        state->refs = [2 4 6] (start of idx=0 iter)
          release state->refs[0] by swapping w/ state->refs[2]
      
        state->refs = [6 4]   (start of idx=1)
          release state->refs[1], no need to swap as it's the last idx
      
        state->refs = [6]     (start of idx=2, loop terminates)
      
      ref_id 6 should have been removed but was skipped.
      
      Fix this by looping from back-to-front, which results in refs that are
      candidates for removal being swapped with refs which have already been
      examined and kept.
      
      If we modify our initial example such that ref 6 is replaced with ref 7,
      which is _not_ release_on_unlock, and loop from the back, we'd see:
      
        state->refs = [2 4 7] (start of idx=2)
      
        state->refs = [2 4 7] (start of idx=1)
      
        state->refs = [2 7]   (start of idx=0, refs 7 and 4 swapped)
      
        state->refs = [7]     (after idx=0, 7 and 2 swapped, loop terminates)
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Fixes: 534e86bc
      
       ("bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}")
      Link: https://lore.kernel.org/r/20221201183406.1203621-1-davemarchevsky@fb.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1f82dffc
  4. Dec 01, 2022
  5. Nov 30, 2022