Skip to content
  1. May 19, 2021
  2. May 18, 2021
  3. May 17, 2021
    • Kumar Kartikeya Dwivedi's avatar
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Add low level TC-BPF management API · 715c5ce4
      Kumar Kartikeya Dwivedi authored
      
      
      This adds functions that wrap the netlink API used for adding, manipulating,
      and removing traffic control filters.
      
      The API summary:
      
      A bpf_tc_hook represents a location where a TC-BPF filter can be attached.
      This means that creating a hook leads to creation of the backing qdisc,
      while destruction either removes all filters attached to a hook, or destroys
      qdisc if requested explicitly (as discussed below).
      
      The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
      query, and detach tc filters. All functions return 0 on success, and a
      negative error code on failure.
      
      bpf_tc_hook_create - Create a hook
      Parameters:
      	@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
      		proper enum constant. Note that parent must be unset when
      		attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
      		that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
      		valid value for attach_point.
      
      		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
      
      bpf_tc_hook_destroy - Destroy a hook
      Parameters:
      	@hook - Cannot be NULL. The behaviour depends on value of
      		attach_point. If BPF_TC_INGRESS, all filters attached to
      		the ingress hook will be detached. If BPF_TC_EGRESS, all
      		filters attached to the egress hook will be detached. If
      		BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
      		deleted, also detaching all filters. As before, parent must
      		be unset for these attach_points, and set for BPF_TC_CUSTOM.
      
      		It is advised that if the qdisc is operated on by many programs,
      		then the program at least check that there are no other existing
      		filters before deleting the clsact qdisc. An example is shown
      		below:
      
      		DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
      				    .attach_point = BPF_TC_INGRESS);
      		/* set opts as NULL, as we're not really interested in
      		 * getting any info for a particular filter, but just
      	 	 * detecting its presence.
      		 */
      		r = bpf_tc_query(&hook, NULL);
      		if (r == -ENOENT) {
      			/* no filters */
      			hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
      			return bpf_tc_hook_destroy(&hook);
      		} else {
      			/* failed or r == 0, the latter means filters do exist */
      			return r;
      		}
      
      		Note that there is a small race between checking for no
      		filters and deleting the qdisc. This is currently unavoidable.
      
      		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
      
      bpf_tc_attach - Attach a filter to a hook
      Parameters:
      	@hook - Cannot be NULL. Represents the hook the filter will be
      		attached to. Requirements for ifindex and attach_point are
      		same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
      		is also supported.  In that case, parent must be set to the
      		handle where the filter will be attached (using BPF_TC_PARENT).
      		E.g. to set parent to 1:16 like in tc command line, the
      		equivalent would be BPF_TC_PARENT(1, 16).
      
      	@opts - Cannot be NULL. The following opts are optional:
      		* handle   - The handle of the filter
      		* priority - The priority of the filter
      			     Must be >= 0 and <= UINT16_MAX
      		Note that when left unset, they will be auto-allocated by
      		the kernel. The following opts must be set:
      		* prog_fd - The fd of the loaded SCHED_CLS prog
      		The following opts must be unset:
      		* prog_id - The ID of the BPF prog
      		The following opts are optional:
      		* flags - Currently only BPF_TC_F_REPLACE is allowed. It
      			  allows replacing an existing filter instead of
      			  failing with -EEXIST.
      		The following opts will be filled by bpf_tc_attach on a
      		successful attach operation if they are unset:
      		* handle   - The handle of the attached filter
      		* priority - The priority of the attached filter
      		* prog_id  - The ID of the attached SCHED_CLS prog
      		This way, the user can know what the auto allocated values
      		for optional opts like handle and priority are for the newly
      		attached filter, if they were unset.
      
      		Note that some other attributes are set to fixed default
      		values listed below (this holds for all bpf_tc_* APIs):
      		protocol as ETH_P_ALL, direct action mode, chain index of 0,
      		and class ID of 0 (this can be set by writing to the
      		skb->tc_classid field from the BPF program).
      
      bpf_tc_detach
      Parameters:
      	@hook - Cannot be NULL. Represents the hook the filter will be
      		detached from. Requirements are same as described above
      		in bpf_tc_attach.
      
      	@opts - Cannot be NULL. The following opts must be set:
      		* handle, priority
      		The following opts must be unset:
      		* prog_fd, prog_id, flags
      
      bpf_tc_query
      Parameters:
      	@hook - Cannot be NULL. Represents the hook where the filter lookup will
      		be performed. Requirements are same as described above in
      		bpf_tc_attach().
      
      	@opts - Cannot be NULL. The following opts must be set:
      		* handle, priority
      		The following opts must be unset:
      		* prog_fd, prog_id, flags
      		The following fields will be filled by bpf_tc_query upon a
      		successful lookup:
      		* prog_id
      
      Some usage examples (using BPF skeleton infrastructure):
      
      BPF program (test_tc_bpf.c):
      
      	#include <linux/bpf.h>
      	#include <bpf/bpf_helpers.h>
      
      	SEC("classifier")
      	int cls(struct __sk_buff *skb)
      	{
      		return 0;
      	}
      
      Userspace loader:
      
      	struct test_tc_bpf *skel = NULL;
      	int fd, r;
      
      	skel = test_tc_bpf__open_and_load();
      	if (!skel)
      		return -ENOMEM;
      
      	fd = bpf_program__fd(skel->progs.cls);
      
      	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
      			    if_nametoindex("lo"), .attach_point =
      			    BPF_TC_INGRESS);
      	/* Create clsact qdisc */
      	r = bpf_tc_hook_create(&hook);
      	if (r < 0)
      		goto end;
      
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
      	r = bpf_tc_attach(&hook, &opts);
      	if (r < 0)
      		goto end;
      	/* Print the auto allocated handle and priority */
      	printf("Handle=%u", opts.handle);
      	printf("Priority=%u", opts.priority);
      
      	opts.prog_fd = opts.prog_id = 0;
      	bpf_tc_detach(&hook, &opts);
      end:
      	test_tc_bpf__destroy(skel);
      
      This is equivalent to doing the following using tc command line:
        # tc qdisc add dev lo clsact
        # tc filter add dev lo ingress bpf obj foo.o sec classifier da
        # tc filter del dev lo ingress handle <h> prio <p> bpf
      ... where the handle and priority can be found using:
        # tc filter show dev lo ingress
      
      Another example replacing a filter (extending prior example):
      
      	/* We can also choose both (or one), let's try replacing an
      	 * existing filter.
      	 */
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
      			    opts.handle, .priority = opts.priority,
      			    .prog_fd = fd);
      	r = bpf_tc_attach(&hook, &replace_opts);
      	if (r == -EEXIST) {
      		/* Expected, now use BPF_TC_F_REPLACE to replace it */
      		replace_opts.flags = BPF_TC_F_REPLACE;
      		return bpf_tc_attach(&hook, &replace_opts);
      	} else if (r < 0) {
      		return r;
      	}
      	/* There must be no existing filter with these
      	 * attributes, so cleanup and return an error.
      	 */
      	replace_opts.prog_fd = replace_opts.prog_id = 0;
      	bpf_tc_detach(&hook, &replace_opts);
      	return -1;
      
      To obtain info of a particular filter:
      
      	/* Find info for filter with handle 1 and priority 50 */
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
      			    .priority = 50);
      	r = bpf_tc_query(&hook, &info_opts);
      	if (r == -ENOENT)
      		printf("Filter not found");
      	else if (r < 0)
      		return r;
      	printf("Prog ID: %u", info_opts.prog_id);
      	return 0;
      
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design
      [ Daniel: also did major patch cleanup ]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
      715c5ce4
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Add various netlink helpers · 8bbb77b7
      Kumar Kartikeya Dwivedi authored
      
      
      This change introduces a few helpers to wrap open coded attribute
      preparation in netlink.c. It also adds a libbpf_netlink_send_recv() that
      is useful to wrap send + recv handling in a generic way. Subsequent patch
      will also use this function for sending and receiving a netlink response.
      The libbpf_nl_get_link() helper has been removed instead, moving socket
      creation into the newly named libbpf_netlink_send_recv().
      
      Every nested attribute's closure must happen using the helper
      nlattr_end_nested(), which sets its length properly. NLA_F_NESTED is
      enforced using nlattr_begin_nested() helper. Other simple attributes
      can be added directly.
      
      The maxsz parameter corresponds to the size of the request structure
      which is being filled in, so for instance with req being:
      
        struct {
      	struct nlmsghdr nh;
      	struct tcmsg t;
      	char buf[4096];
        } req;
      
      Then, maxsz should be sizeof(req).
      
      This change also converts the open coded attribute preparation with these
      helpers. Note that the only failure the internal call to nlattr_add()
      could result in the nested helper would be -EMSGSIZE, hence that is what
      we return to our caller.
      
      The libbpf_netlink_send_recv() call takes care of opening the socket,
      sending the netlink message, receiving the response, potentially invoking
      callbacks, and return errors if any, and then finally close the socket.
      This allows users to avoid identical socket setup code in different places.
      The only user of libbpf_nl_get_link() has been converted to make use of it.
      __bpf_set_link_xdp_fd_replace() has also been refactored to use it.
      
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      [ Daniel: major patch cleanup ]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210512103451.989420-2-memxor@gmail.com
      8bbb77b7
  4. May 15, 2021
  5. May 14, 2021
  6. May 13, 2021
  7. May 12, 2021
  8. May 11, 2021
  9. May 09, 2021
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · b7415964
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A fix to avoid over-allocating the kernel's mapping on !MMU systems,
         which could lead to up to 2MiB of lost memory
      
       - The SiFive address extension errata only manifest on rv64, they are
         now disabled on rv32 where they are unnecessary
      
       - A pair of late-landing cleanups
      
      * tag 'riscv-for-linus-5.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: remove unused handle_exception symbol
        riscv: Consistify protect_kernel_linear_mapping_text_rodata() use
        riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=y
        riscv: Only extend kernel reservation if mapped read-only
      b7415964
    • Linus Torvalds's avatar
      drm/i915/display: fix compiler warning about array overrun · fec4d427
      Linus Torvalds authored
      
      
      intel_dp_check_mst_status() uses a 14-byte array to read the DPRX Event
      Status Indicator data, but then passes that buffer at offset 10 off as
      an argument to drm_dp_channel_eq_ok().
      
      End result: there are only 4 bytes remaining of the buffer, yet
      drm_dp_channel_eq_ok() wants a 6-byte buffer.  gcc-11 correctly warns
      about this case:
      
        drivers/gpu/drm/i915/display/intel_dp.c: In function ‘intel_dp_check_mst_status’:
        drivers/gpu/drm/i915/display/intel_dp.c:3491:22: warning: ‘drm_dp_channel_eq_ok’ reading 6 bytes from a region of size 4 [-Wstringop-overread]
         3491 |                     !drm_dp_channel_eq_ok(&esi[10], intel_dp->lane_count)) {
              |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        drivers/gpu/drm/i915/display/intel_dp.c:3491:22: note: referencing argument 1 of type ‘const u8 *’ {aka ‘const unsigned char *’}
        In file included from drivers/gpu/drm/i915/display/intel_dp.c:38:
        include/drm/drm_dp_helper.h:1466:6: note: in a call to function ‘drm_dp_channel_eq_ok’
         1466 | bool drm_dp_channel_eq_ok(const u8 link_status[DP_LINK_STATUS_SIZE],
              |      ^~~~~~~~~~~~~~~~~~~~
             6:14 elapsed
      
      This commit just extends the original array by 2 zero-initialized bytes,
      avoiding the warning.
      
      There may be some underlying bug in here that caused this confusion, but
      this is at least no worse than the existing situation that could use
      random data off the stack.
      
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fec4d427