Commit 5b7c4cab authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull networking updates from Jakub Kicinski:
 "Core:

   - Add dedicated kmem_cache for typical/small skb->head, avoid having
     to access struct page at kfree time, and improve memory use.

   - Introduce sysctl to set default RPS configuration for new netdevs.

   - Define Netlink protocol specification format which can be used to
     describe messages used by each family and auto-generate parsers.
     Add tools for generating kernel data structures and uAPI headers.

   - Expose all net/core sysctls inside netns.

   - Remove 4s sleep in netpoll if carrier is instantly detected on
     boot.

   - Add configurable limit of MDB entries per port, and port-vlan.

   - Continue populating drop reasons throughout the stack.

   - Retire a handful of legacy Qdiscs and classifiers.

  Protocols:

   - Support IPv4 big TCP (TSO frames larger than 64kB).

   - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
     on socket by socket basis.

   - Track and report in procfs number of MPTCP sockets used.

   - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
     manager.

   - IPv6: don't check net.ipv6.route.max_size and rely on garbage
     collection to free memory (similarly to IPv4).

   - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).

   - ICMP: add per-rate limit counters.

   - Add support for user scanning requests in ieee802154.

   - Remove static WEP support.

   - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
     reporting.

   - WiFi 7 EHT channel puncturing support (client & AP).

  BPF:

   - Add a rbtree data structure following the "next-gen data structure"
     precedent set by recently added linked list, that is, by using
     kfunc + kptr instead of adding a new BPF map type.

   - Expose XDP hints via kfuncs with initial support for RX hash and
     timestamp metadata.

   - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
     better support decap on GRE tunnel devices not operating in collect
     metadata.

   - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.

   - Remove the need for trace_printk_lock for bpf_trace_printk and
     bpf_trace_vprintk helpers.

   - Extend libbpf's bpf_tracing.h support for tracing arguments of
     kprobes/uprobes and syscall as a special case.

   - Significantly reduce the search time for module symbols by
     livepatch and BPF.

   - Enable cpumasks to be used as kptrs, which is useful for tracing
     programs tracking which tasks end up running on which CPUs in
     different time intervals.

   - Add support for BPF trampoline on s390x and riscv64.

   - Add capability to export the XDP features supported by the NIC.

   - Add __bpf_kfunc tag for marking kernel functions as kfuncs.

   - Add cgroup.memory=nobpf kernel parameter option to disable BPF
     memory accounting for container environments.

  Netfilter:

   - Remove the CLUSTERIP target. It has been marked as obsolete for
     years, and we still have WARN splats wrt races of the out-of-band
     /proc interface installed by this target.

   - Add 'destroy' commands to nf_tables. They are identical to the
     existing 'delete' commands, but do not return an error if the
     referenced object (set, chain, rule...) did not exist.

  Driver API:

   - Improve cpumask_local_spread() locality to help NICs set the right
     IRQ affinity on AMD platforms.

   - Separate C22 and C45 MDIO bus transactions more clearly.

   - Introduce new DCB table to control DSCP rewrite on egress.

   - Support configuration of Physical Layer Collision Avoidance (PLCA)
     Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
     shared medium Ethernet.

   - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
     preemption of low priority frames by high priority frames.

   - Add support for controlling MACSec offload using netlink SET.

   - Rework devlink instance refcounts to allow registration and
     de-registration under the instance lock. Split the code into
     multiple files, drop some of the unnecessarily granular locks and
     factor out common parts of netlink operation handling.

   - Add TX frame aggregation parameters (for USB drivers).

   - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
     messages with notifications for debug.

   - Allow offloading of UDP NEW connections via act_ct.

   - Add support for per action HW stats in TC.

   - Support hardware miss to TC action (continue processing in SW from
     a specific point in the action chain).

   - Warn if old Wireless Extension user space interface is used with
     modern cfg80211/mac80211 drivers. Do not support Wireless
     Extensions for Wi-Fi 7 devices at all. Everyone should switch to
     using nl80211 interface instead.

   - Improve the CAN bit timing configuration. Use extack to return
     error messages directly to user space, update the SJW handling,
     including the definition of a new default value that will benefit
     CAN-FD controllers, by increasing their oscillator tolerance.

  New hardware / drivers:

   - Ethernet:
      - nVidia BlueField-3 support (control traffic driver)
      - Ethernet support for imx93 SoCs
      - Motorcomm yt8531 gigabit Ethernet PHY
      - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
      - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
      - Amlogic gxl MDIO mux

   - WiFi:
      - RealTek RTL8188EU (rtl8xxxu)
      - Qualcomm Wi-Fi 7 devices (ath12k)

   - CAN:
      - Renesas R-Car V4H

  Drivers:

   - Bluetooth:
      - Set Per Platform Antenna Gain (PPAG) for Intel controllers.

   - Ethernet NICs:
      - Intel (1G, igc):
         - support TSN / Qbv / packet scheduling features of i226 model
      - Intel (100G, ice):
         - use GNSS subsystem instead of TTY
         - multi-buffer XDP support
         - extend support for GPIO pins to E823 devices
      - nVidia/Mellanox:
         - update the shared buffer configuration on PFC commands
         - implement PTP adjphase function for HW offset control
         - TC support for Geneve and GRE with VF tunnel offload
         - more efficient crypto key management method
         - multi-port eswitch support
      - Netronome/Corigine:
         - add DCB IEEE support
         - support IPsec offloading for NFP3800
      - Freescale/NXP (enetc):
         - support XDP_REDIRECT for XDP non-linear buffers
         - improve reconfig, avoid link flap and waiting for idle
         - support MAC Merge layer
      - Other NICs:
         - sfc/ef100: add basic devlink support for ef100
         - ionic: rx_push mode operation (writing descriptors via MMIO)
         - bnxt: use the auxiliary bus abstraction for RDMA
         - r8169: disable ASPM and reset bus in case of tx timeout
         - cpsw: support QSGMII mode for J721e CPSW9G
         - cpts: support pulse-per-second output
         - ngbe: add an mdio bus driver
         - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
         - r8152: handle devices with FW with NCM support
         - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
         - virtio-net: support multi buffer XDP
         - virtio/vsock: replace virtio_vsock_pkt with sk_buff
         - tsnep: XDP support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add support for latency TLV (in FW control messages)
      - Microchip (sparx5):
         - separate explicit and implicit traffic forwarding rules, make
           the implicit rules always active
         - add support for egress DSCP rewrite
         - IS0 VCAP support (Ingress Classification)
         - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
           etc.)
         - ES2 VCAP support (Egress Access Control)
         - support for Per-Stream Filtering and Policing (802.1Q,
           8.6.5.1)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - add MAB (port auth) offload support
         - enable PTP receive for mv88e6390
      - NXP (ocelot):
         - support MAC Merge layer
         - support for the the vsc7512 internal copper phys
      - Microchip:
         - lan9303: convert to PHYLINK
         - lan966x: support TC flower filter statistics
         - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
         - lan937x: support Credit Based Shaper configuration
         - ksz9477: support Energy Efficient Ethernet
      - other:
         - qca8k: convert to regmap read/write API, use bulk operations
         - rswitch: Improve TX timestamp accuracy

   - Intel WiFi (iwlwifi):
      - EHT (Wi-Fi 7) rate reporting
      - STEP equalizer support: transfer some STEP (connection to radio
        on platforms with integrated wifi) related parameters from the
        BIOS to the firmware.

   - Qualcomm 802.11ax WiFi (ath11k):
      - IPQ5018 support
      - Fine Timing Measurement (FTM) responder role support
      - channel 177 support

   - MediaTek WiFi (mt76):
      - per-PHY LED support
      - mt7996: EHT (Wi-Fi 7) support
      - Wireless Ethernet Dispatch (WED) reset support
      - switch to using page pool allocator

   - RealTek WiFi (rtw89):
      - support new version of Bluetooth co-existance

   - Mobile:
      - rmnet: support TX aggregation"

* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
  page_pool: add a comment explaining the fragment counter usage
  net: ethtool: fix __ethtool_dev_mm_supported() implementation
  ethtool: pse-pd: Fix double word in comments
  xsk: add linux/vmalloc.h to xsk.c
  sefltests: netdevsim: wait for devlink instance after netns removal
  selftest: fib_tests: Always cleanup before exit
  net/mlx5e: Align IPsec ASO result memory to be as required by hardware
  net/mlx5e: TC, Set CT miss to the specific ct action instance
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/sched: flower: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: Rename user cookie and act cookie
  sfc: fix builds without CONFIG_RTC_LIB
  sfc: clean up some inconsistent indentings
  net/mlx4_en: Introduce flexible array to silence overflow warning
  net: lan966x: Fix possible deadlock inside PTP
  net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
  ...
parents 36289a03 d1fabc68
Loading
Loading
Loading
Loading
+19 −0
Original line number Diff line number Diff line

What:		/sys/class/net/<iface>/peak_usb/can_channel_id
Date:		November 2022
KernelVersion:	6.2
Contact:	Stephane Grosjean <s.grosjean@peak-system.com>
Description:
		PEAK PCAN-USB devices support user-configurable CAN channel
		identifiers. Contrary to a USB serial number, these identifiers
		are writable and can be set per CAN interface. This means that
		if a USB device exports multiple CAN interfaces, each of them
		can be assigned a unique channel ID.
		This attribute provides read-only access to the currently
		configured value of the channel identifier. Depending on the
		device type, the identifier has a length of 8 or 32 bit. The
		value read from this attribute is always an 8 digit 32 bit
		hexadecimal value in big endian format. If the device only
		supports an 8 bit identifier, the upper 24 bit of the value are
		set to zero.
+1 −0
Original line number Diff line number Diff line
@@ -557,6 +557,7 @@
			Format: <string>
			nosocket -- Disable socket memory accounting.
			nokmem -- Disable kernel memory accounting.
			nobpf -- Disable BPF memory accounting.

	checkreqprot=	[SELINUX] Set initial checkreqprot flag value.
			Format: { "0" | "1" }
+6 −0
Original line number Diff line number Diff line
@@ -215,6 +215,12 @@ rmem_max

The maximum receive socket buffer size in bytes.

rps_default_mask
----------------

The default RPS CPU mask used on newly created network devices. An empty
mask means RPS disabled by default.

tstamp_allow_data
-----------------
Allow processes to receive tx timestamps looped together with the original
+18 −7
Original line number Diff line number Diff line
@@ -208,6 +208,10 @@ data structures and compile with kernel internal headers. Both of these
kernel internals are subject to change and can break with newer kernels
such that the program needs to be adapted accordingly.

New BPF functionality is generally added through the use of kfuncs instead of
new helpers. Kfuncs are not considered part of the stable API, and have their own
lifecycle expectations as described in :ref:`BPF_kfunc_lifecycle_expectations`.

Q: Are tracepoints part of the stable ABI?
------------------------------------------
A: NO. Tracepoints are tied to internal implementation details hence they are
@@ -236,8 +240,8 @@ A: NO. Classic BPF programs are converted into extend BPF instructions.

Q: Can BPF call arbitrary kernel functions?
-------------------------------------------
A: NO. BPF programs can only call a set of helper functions which
is defined for every program type.
A: NO. BPF programs can only call specific functions exposed as BPF helpers or
kfuncs. The set of available functions is defined for every program type.

Q: Can BPF overwrite arbitrary kernel memory?
---------------------------------------------
@@ -263,7 +267,12 @@ Q: New functionality via kernel modules?
Q: Can BPF functionality such as new program or map types, new
helpers, etc be added out of kernel module code?

A: NO.
A: Yes, through kfuncs and kptrs

The core BPF functionality such as program types, maps and helpers cannot be
added to by modules. However, modules can expose functionality to BPF programs
by exporting kfuncs (which may return pointers to module-internal data
structures as kptrs).

Q: Directly calling kernel function is an ABI?
----------------------------------------------
@@ -278,7 +287,8 @@ kernel functions have already been used by other kernel tcp
cc (congestion-control) implementations.  If any of these kernel
functions has changed, both the in-tree and out-of-tree kernel tcp cc
implementations have to be changed.  The same goes for the bpf
programs and they have to be adjusted accordingly.
programs and they have to be adjusted accordingly. See
:ref:`BPF_kfunc_lifecycle_expectations` for details.

Q: Attaching to arbitrary kernel functions is an ABI?
-----------------------------------------------------
@@ -340,6 +350,7 @@ compatibility for these features?

A: NO.

Unlike map value types, there are no stability guarantees for this case. The
whole API to work with allocated objects and any support for special fields
inside them is unstable (since it is exposed through kfuncs).
Unlike map value types, the API to work with allocated objects and any support
for special fields inside them is exposed through kfuncs, and thus has the same
lifecycle expectations as the kfuncs themselves. See
:ref:`BPF_kfunc_lifecycle_expectations` for details.
+393 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

.. _cpumasks-header-label:

==================
BPF cpumask kfuncs
==================

1. Introduction
===============

``struct cpumask`` is a bitmap data structure in the kernel whose indices
reflect the CPUs on the system. Commonly, cpumasks are used to track which CPUs
a task is affinitized to, but they can also be used to e.g. track which cores
are associated with a scheduling domain, which cores on a machine are idle,
etc.

BPF provides programs with a set of :ref:`kfuncs-header-label` that can be
used to allocate, mutate, query, and free cpumasks.

2. BPF cpumask objects
======================

There are two different types of cpumasks that can be used by BPF programs.

2.1 ``struct bpf_cpumask *``
----------------------------

``struct bpf_cpumask *`` is a cpumask that is allocated by BPF, on behalf of a
BPF program, and whose lifecycle is entirely controlled by BPF. These cpumasks
are RCU-protected, can be mutated, can be used as kptrs, and can be safely cast
to a ``struct cpumask *``.

2.1.1 ``struct bpf_cpumask *`` lifecycle
----------------------------------------

A ``struct bpf_cpumask *`` is allocated, acquired, and released, using the
following functions:

.. kernel-doc:: kernel/bpf/cpumask.c
  :identifiers: bpf_cpumask_create

.. kernel-doc:: kernel/bpf/cpumask.c
  :identifiers: bpf_cpumask_acquire

.. kernel-doc:: kernel/bpf/cpumask.c
  :identifiers: bpf_cpumask_release

For example:

.. code-block:: c

        struct cpumask_map_value {
                struct bpf_cpumask __kptr_ref * cpumask;
        };

        struct array_map {
                __uint(type, BPF_MAP_TYPE_ARRAY);
                __type(key, int);
                __type(value, struct cpumask_map_value);
                __uint(max_entries, 65536);
        } cpumask_map SEC(".maps");

        static int cpumask_map_insert(struct bpf_cpumask *mask, u32 pid)
        {
                struct cpumask_map_value local, *v;
                long status;
                struct bpf_cpumask *old;
                u32 key = pid;

                local.cpumask = NULL;
                status = bpf_map_update_elem(&cpumask_map, &key, &local, 0);
                if (status) {
                        bpf_cpumask_release(mask);
                        return status;
                }

                v = bpf_map_lookup_elem(&cpumask_map, &key);
                if (!v) {
                        bpf_cpumask_release(mask);
                        return -ENOENT;
                }

                old = bpf_kptr_xchg(&v->cpumask, mask);
                if (old)
                        bpf_cpumask_release(old);

                return 0;
        }

        /**
         * A sample tracepoint showing how a task's cpumask can be queried and
         * recorded as a kptr.
         */
        SEC("tp_btf/task_newtask")
        int BPF_PROG(record_task_cpumask, struct task_struct *task, u64 clone_flags)
        {
                struct bpf_cpumask *cpumask;
                int ret;

                cpumask = bpf_cpumask_create();
                if (!cpumask)
                        return -ENOMEM;

                if (!bpf_cpumask_full(task->cpus_ptr))
                        bpf_printk("task %s has CPU affinity", task->comm);

                bpf_cpumask_copy(cpumask, task->cpus_ptr);
                return cpumask_map_insert(cpumask, task->pid);
        }

----

2.1.1 ``struct bpf_cpumask *`` as kptrs
---------------------------------------

As mentioned and illustrated above, these ``struct bpf_cpumask *`` objects can
also be stored in a map and used as kptrs. If a ``struct bpf_cpumask *`` is in
a map, the reference can be removed from the map with bpf_kptr_xchg(), or
opportunistically acquired with bpf_cpumask_kptr_get():

.. kernel-doc:: kernel/bpf/cpumask.c
  :identifiers: bpf_cpumask_kptr_get

Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:

.. code-block:: c

	/* struct containing the struct bpf_cpumask kptr which is stored in the map. */
	struct cpumasks_kfunc_map_value {
		struct bpf_cpumask __kptr_ref * bpf_cpumask;
	};

	/* The map containing struct cpumasks_kfunc_map_value entries. */
	struct {
		__uint(type, BPF_MAP_TYPE_ARRAY);
		__type(key, int);
		__type(value, struct cpumasks_kfunc_map_value);
		__uint(max_entries, 1);
	} cpumasks_kfunc_map SEC(".maps");

	/* ... */

	/**
	 * A simple example tracepoint program showing how a
	 * struct bpf_cpumask * kptr that is stored in a map can
	 * be acquired using the bpf_cpumask_kptr_get() kfunc.
	 */
	SEC("tp_btf/cgroup_mkdir")
	int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
	{
		struct bpf_cpumask *kptr;
		struct cpumasks_kfunc_map_value *v;
		u32 key = 0;

		/* Assume a bpf_cpumask * kptr was previously stored in the map. */
		v = bpf_map_lookup_elem(&cpumasks_kfunc_map, &key);
		if (!v)
			return -ENOENT;

		/* Acquire a reference to the bpf_cpumask * kptr that's already stored in the map. */
		kptr = bpf_cpumask_kptr_get(&v->cpumask);
		if (!kptr)
			/* If no bpf_cpumask was present in the map, it's because
			 * we're racing with another CPU that removed it with
			 * bpf_kptr_xchg() between the bpf_map_lookup_elem()
			 * above, and our call to bpf_cpumask_kptr_get().
			 * bpf_cpumask_kptr_get() internally safely handles this
			 * race, and will return NULL if the cpumask is no longer
			 * present in the map by the time we invoke the kfunc.
			 */
			return -EBUSY;

		/* Free the reference we just took above. Note that the
		 * original struct bpf_cpumask * kptr is still in the map. It will
		 * be freed either at a later time if another context deletes
		 * it from the map, or automatically by the BPF subsystem if
		 * it's still present when the map is destroyed.
		 */
		bpf_cpumask_release(kptr);

		return 0;
	}

----

2.2 ``struct cpumask``
----------------------

``struct cpumask`` is the object that actually contains the cpumask bitmap
being queried, mutated, etc. A ``struct bpf_cpumask`` wraps a ``struct
cpumask``, which is why it's safe to cast it as such (note however that it is
**not** safe to cast a ``struct cpumask *`` to a ``struct bpf_cpumask *``, and
the verifier will reject any program that tries to do so).

As we'll see below, any kfunc that mutates its cpumask argument will take a
``struct bpf_cpumask *`` as that argument. Any argument that simply queries the
cpumask will instead take a ``struct cpumask *``.

3. cpumask kfuncs
=================

Above, we described the kfuncs that can be used to allocate, acquire, release,
etc a ``struct bpf_cpumask *``. This section of the document will describe the
kfuncs for mutating and querying cpumasks.

3.1 Mutating cpumasks
---------------------

Some cpumask kfuncs are "read-only" in that they don't mutate any of their
arguments, whereas others mutate at least one argument (which means that the
argument must be a ``struct bpf_cpumask *``, as described above).

This section will describe all of the cpumask kfuncs which mutate at least one
argument. :ref:`cpumasks-querying-label` below describes the read-only kfuncs.

3.1.1 Setting and clearing CPUs
-------------------------------

bpf_cpumask_set_cpu() and bpf_cpumask_clear_cpu() can be used to set and clear
a CPU in a ``struct bpf_cpumask`` respectively:

.. kernel-doc:: kernel/bpf/cpumask.c
   :identifiers: bpf_cpumask_set_cpu bpf_cpumask_clear_cpu

These kfuncs are pretty straightforward, and can be used, for example, as
follows:

.. code-block:: c

        /**
         * A sample tracepoint showing how a cpumask can be queried.
         */
        SEC("tp_btf/task_newtask")
        int BPF_PROG(test_set_clear_cpu, struct task_struct *task, u64 clone_flags)
        {
                struct bpf_cpumask *cpumask;

                cpumask = bpf_cpumask_create();
                if (!cpumask)
                        return -ENOMEM;

                bpf_cpumask_set_cpu(0, cpumask);
                if (!bpf_cpumask_test_cpu(0, cast(cpumask)))
                        /* Should never happen. */
                        goto release_exit;

                bpf_cpumask_clear_cpu(0, cpumask);
                if (bpf_cpumask_test_cpu(0, cast(cpumask)))
                        /* Should never happen. */
                        goto release_exit;

                /* struct cpumask * pointers such as task->cpus_ptr can also be queried. */
                if (bpf_cpumask_test_cpu(0, task->cpus_ptr))
                        bpf_printk("task %s can use CPU %d", task->comm, 0);

        release_exit:
                bpf_cpumask_release(cpumask);
                return 0;
        }

----

bpf_cpumask_test_and_set_cpu() and bpf_cpumask_test_and_clear_cpu() are
complementary kfuncs that allow callers to atomically test and set (or clear)
CPUs:

.. kernel-doc:: kernel/bpf/cpumask.c
   :identifiers: bpf_cpumask_test_and_set_cpu bpf_cpumask_test_and_clear_cpu

----

We can also set and clear entire ``struct bpf_cpumask *`` objects in one
operation using bpf_cpumask_setall() and bpf_cpumask_clear():

.. kernel-doc:: kernel/bpf/cpumask.c
   :identifiers: bpf_cpumask_setall bpf_cpumask_clear

3.1.2 Operations between cpumasks
---------------------------------

In addition to setting and clearing individual CPUs in a single cpumask,
callers can also perform bitwise operations between multiple cpumasks using
bpf_cpumask_and(), bpf_cpumask_or(), and bpf_cpumask_xor():

.. kernel-doc:: kernel/bpf/cpumask.c
   :identifiers: bpf_cpumask_and bpf_cpumask_or bpf_cpumask_xor

The following is an example of how they may be used. Note that some of the
kfuncs shown in this example will be covered in more detail below.

.. code-block:: c

        /**
         * A sample tracepoint showing how a cpumask can be mutated using
           bitwise operators (and queried).
         */
        SEC("tp_btf/task_newtask")
        int BPF_PROG(test_and_or_xor, struct task_struct *task, u64 clone_flags)
        {
                struct bpf_cpumask *mask1, *mask2, *dst1, *dst2;

                mask1 = bpf_cpumask_create();
                if (!mask1)
                        return -ENOMEM;

                mask2 = bpf_cpumask_create();
                if (!mask2) {
                        bpf_cpumask_release(mask1);
                        return -ENOMEM;
                }

                // ...Safely create the other two masks... */

                bpf_cpumask_set_cpu(0, mask1);
                bpf_cpumask_set_cpu(1, mask2);
                bpf_cpumask_and(dst1, (const struct cpumask *)mask1, (const struct cpumask *)mask2);
                if (!bpf_cpumask_empty((const struct cpumask *)dst1))
                        /* Should never happen. */
                        goto release_exit;

                bpf_cpumask_or(dst1, (const struct cpumask *)mask1, (const struct cpumask *)mask2);
                if (!bpf_cpumask_test_cpu(0, (const struct cpumask *)dst1))
                        /* Should never happen. */
                        goto release_exit;

                if (!bpf_cpumask_test_cpu(1, (const struct cpumask *)dst1))
                        /* Should never happen. */
                        goto release_exit;

                bpf_cpumask_xor(dst2, (const struct cpumask *)mask1, (const struct cpumask *)mask2);
                if (!bpf_cpumask_equal((const struct cpumask *)dst1,
                                       (const struct cpumask *)dst2))
                        /* Should never happen. */
                        goto release_exit;

         release_exit:
                bpf_cpumask_release(mask1);
                bpf_cpumask_release(mask2);
                bpf_cpumask_release(dst1);
                bpf_cpumask_release(dst2);
                return 0;
        }

----

The contents of an entire cpumask may be copied to another using
bpf_cpumask_copy():

.. kernel-doc:: kernel/bpf/cpumask.c
   :identifiers: bpf_cpumask_copy

----

.. _cpumasks-querying-label:

3.2 Querying cpumasks
---------------------

In addition to the above kfuncs, there is also a set of read-only kfuncs that
can be used to query the contents of cpumasks.

.. kernel-doc:: kernel/bpf/cpumask.c
   :identifiers: bpf_cpumask_first bpf_cpumask_first_zero bpf_cpumask_test_cpu

.. kernel-doc:: kernel/bpf/cpumask.c
   :identifiers: bpf_cpumask_equal bpf_cpumask_intersects bpf_cpumask_subset
                 bpf_cpumask_empty bpf_cpumask_full

.. kernel-doc:: kernel/bpf/cpumask.c
   :identifiers: bpf_cpumask_any bpf_cpumask_any_and

----

Some example usages of these querying kfuncs were shown above. We will not
replicate those exmaples here. Note, however, that all of the aforementioned
kfuncs are tested in `tools/testing/selftests/bpf/progs/cpumask_success.c`_, so
please take a look there if you're looking for more examples of how they can be
used.

.. _tools/testing/selftests/bpf/progs/cpumask_success.c:
   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/progs/cpumask_success.c


4. Adding BPF cpumask kfuncs
============================

The set of supported BPF cpumask kfuncs are not (yet) a 1-1 match with the
cpumask operations in include/linux/cpumask.h. Any of those cpumask operations
could easily be encapsulated in a new kfunc if and when required. If you'd like
to support a new cpumask operation, please feel free to submit a patch. If you
do add a new cpumask kfunc, please document it here, and add any relevant
selftest testcases to the cpumask selftest suite.
Loading