Skip to content
  1. Nov 26, 2016
    • Daniel Mack's avatar
      samples: bpf: add userspace example for attaching eBPF programs to cgroups · d8c5b17f
      Daniel Mack authored
      
      
      Add a simple userpace program to demonstrate the new API to attach eBPF
      programs to cgroups. This is what it does:
      
       * Create arraymap in kernel with 4 byte keys and 8 byte values
      
       * Load eBPF program
      
         The eBPF program accesses the map passed in to store two pieces of
         information. The number of invocations of the program, which maps
         to the number of packets received, is stored to key 0. Key 1 is
         incremented on each iteration by the number of bytes stored in
         the skb.
      
       * Detach any eBPF program previously attached to the cgroup
      
       * Attach the new program to the cgroup using BPF_PROG_ATTACH
      
       * Once a second, read map[0] and map[1] to see how many bytes and
         packets were seen on any socket of tasks in the given cgroup.
      
      The program takes a cgroup path as 1st argument, and either "ingress"
      or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
      which will make the generated eBPF program return 0 instead of 1, so
      the kernel will drop the packet.
      
      libbpf gained two new wrappers for the new syscall commands.
      
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8c5b17f
    • Daniel Mack's avatar
      net: ipv4, ipv6: run cgroup eBPF egress programs · 33b48679
      Daniel Mack authored
      If the cgroup associated with the receiving socket has an eBPF
      programs installed, run them from ip_output(), ip6_output() and
      ip_mc_output(). From mentioned functions we have two socket contexts
      as per 7026b1dd
      
       ("netfilter: Pass socket pointer down through
      okfn()."). We explicitly need to use sk instead of skb->sk here,
      since otherwise the same program would run multiple times on egress
      when encap devices are involved, which is not desired in our case.
      
      eBPF programs used in this context are expected to either return 1 to
      let the packet pass, or != 1 to drop them. The programs have access to
      the skb through bpf_skb_load_bytes(), and the payload starts at the
      network headers (L3).
      
      Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
      for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
      the feature is unused.
      
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33b48679
    • Daniel Mack's avatar
      net: filter: run cgroup eBPF ingress programs · c11cd3a6
      Daniel Mack authored
      
      
      If the cgroup associated with the receiving socket has an eBPF
      programs installed, run them from sk_filter_trim_cap().
      
      eBPF programs used in this context are expected to either return 1 to
      let the packet pass, or != 1 to drop them. The programs have access to
      the skb through bpf_skb_load_bytes(), and the payload starts at the
      network headers (L3).
      
      Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
      for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
      the feature is unused.
      
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c11cd3a6
    • Daniel Mack's avatar
      bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands · f4324551
      Daniel Mack authored
      
      
      Extend the bpf(2) syscall by two new commands, BPF_PROG_ATTACH and
      BPF_PROG_DETACH which allow attaching and detaching eBPF programs
      to a target.
      
      On the API level, the target could be anything that has an fd in
      userspace, hence the name of the field in union bpf_attr is called
      'target_fd'.
      
      When called with BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS, the target is
      expected to be a valid file descriptor of a cgroup v2 directory which
      has the bpf controller enabled. These are the only use-cases
      implemented by this patch at this point, but more can be added.
      
      If a program of the given type already exists in the given cgroup,
      the program is swapped automically, so userspace does not have to drop
      an existing program first before installing a new one, which would
      otherwise leave a gap in which no program is attached.
      
      For more information on the propagation logic to subcgroups, please
      refer to the bpf cgroup controller implementation.
      
      The API is guarded by CAP_NET_ADMIN.
      
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4324551
    • Daniel Mack's avatar
      cgroup: add support for eBPF programs · 30070984
      Daniel Mack authored
      
      
      This patch adds two sets of eBPF program pointers to struct cgroup.
      One for such that are directly pinned to a cgroup, and one for such
      that are effective for it.
      
      To illustrate the logic behind that, assume the following example
      cgroup hierarchy.
      
        A - B - C
              \ D - E
      
      If only B has a program attached, it will be effective for B, C, D
      and E. If D then attaches a program itself, that will be effective for
      both D and E, and the program in B will only affect B and C. Only one
      program of a given type is effective for a cgroup.
      
      Attaching and detaching programs will be done through the bpf(2)
      syscall. For now, ingress and egress inet socket filtering are the
      only supported use-cases.
      
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30070984
    • Daniel Mack's avatar
      bpf: add new prog type for cgroup socket filtering · 0e33661d
      Daniel Mack authored
      
      
      This program type is similar to BPF_PROG_TYPE_SOCKET_FILTER, except that
      it does not allow BPF_LD_[ABS|IND] instructions and hooks up the
      bpf_skb_load_bytes() helper.
      
      Programs of this type will be attached to cgroups for network filtering
      and accounting.
      
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e33661d
    • Colin Ian King's avatar
      cxgb4: fix memory leak on txq_info · 619228d8
      Colin Ian King authored
      
      
      Currently if txq_info->uldtxq cannot be allocated then
      txq_info->txq is being kfree'd (which is redundant because it
      is NULL) instead of txq_info. Fix this by instead kfree'ing
      txq_info.
      
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      619228d8
  2. Nov 25, 2016
  3. Nov 23, 2016
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · f9aa9dc7
      David S. Miller authored
      
      
      All conflicts were simple overlapping changes except perhaps
      for the Thunder driver.
      
      That driver has a change_mtu method explicitly for sending
      a message to the hardware.  If that fails it returns an
      error.
      
      Normally a driver doesn't need an ndo_change_mtu method becuase those
      are usually just range changes, which are now handled generically.
      But since this extra operation is needed in the Thunder driver, it has
      to stay.
      
      However, if the message send fails we have to restore the original
      MTU before the change because the entire call chain expects that if
      an error is thrown by ndo_change_mtu then the MTU did not change.
      Therefore code is added to nicvf_change_mtu to remember the original
      MTU, and to restore it upon nicvf_update_hw_max_frs() failue.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9aa9dc7
  4. Nov 22, 2016