Commit dbe69e43 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull networking updates from Jakub Kicinski:
 "Core:

   - BPF:
      - add syscall program type and libbpf support for generating
        instructions and bindings for in-kernel BPF loaders (BPF loaders
        for BPF), this is a stepping stone for signed BPF programs
      - infrastructure to migrate TCP child sockets from one listener to
        another in the same reuseport group/map to improve flexibility
        of service hand-off/restart
      - add broadcast support to XDP redirect

   - allow bypass of the lockless qdisc to improving performance (for
     pktgen: +23% with one thread, +44% with 2 threads)

   - add a simpler version of "DO_ONCE()" which does not require jump
     labels, intended for slow-path usage

   - virtio/vsock: introduce SOCK_SEQPACKET support

   - add getsocketopt to retrieve netns cookie

   - ip: treat lowest address of a IPv4 subnet as ordinary unicast
     address allowing reclaiming of precious IPv4 addresses

   - ipv6: use prandom_u32() for ID generation

   - ip: add support for more flexible field selection for hashing
     across multi-path routes (w/ offload to mlxsw)

   - icmp: add support for extended RFC 8335 PROBE (ping)

   - seg6: add support for SRv6 End.DT46 behavior

   - mptcp:
      - DSS checksum support (RFC 8684) to detect middlebox meddling
      - support Connection-time 'C' flag
      - time stamping support

   - sctp: packetization Layer Path MTU Discovery (RFC 8899)

   - xfrm: speed up state addition with seq set

   - WiFi:
      - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
      - aggregation handling improvements for some drivers
      - minstrel improvements for no-ack frames
      - deferred rate control for TXQs to improve reaction times
      - switch from round robin to virtual time-based airtime scheduler

   - add trace points:
      - tcp checksum errors
      - openvswitch - action execution, upcalls
      - socket errors via sk_error_report

  Device APIs:

   - devlink: add rate API for hierarchical control of max egress rate
     of virtual devices (VFs, SFs etc.)

   - don't require RCU read lock to be held around BPF hooks in NAPI
     context

   - page_pool: generic buffer recycling

  New hardware/drivers:

   - mobile:
      - iosm: PCIe Driver for Intel M.2 Modem
      - support for Qualcomm MSM8998 (ipa)

   - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices

   - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches

   - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)

   - NXP SJA1110 Automotive Ethernet 10-port switch

   - Qualcomm QCA8327 switch support (qca8k)

   - Mikrotik 10/25G NIC (atl1c)

  Driver changes:

   - ACPI support for some MDIO, MAC and PHY devices from Marvell and
     NXP (our first foray into MAC/PHY description via ACPI)

   - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx

   - Mellanox/Nvidia NIC (mlx5)
      - NIC VF offload of L2 bridging
      - support IRQ distribution to Sub-functions

   - Marvell (prestera):
      - add flower and match all
      - devlink trap
      - link aggregation

   - Netronome (nfp): connection tracking offload

   - Intel 1GE (igc): add AF_XDP support

   - Marvell DPU (octeontx2): ingress ratelimit offload

   - Google vNIC (gve): new ring/descriptor format support

   - Qualcomm mobile (rmnet & ipa): inline checksum offload support

   - MediaTek WiFi (mt76)
      - mt7915 MSI support
      - mt7915 Tx status reporting
      - mt7915 thermal sensors support
      - mt7921 decapsulation offload
      - mt7921 enable runtime pm and deep sleep

   - Realtek WiFi (rtw88)
      - beacon filter support
      - Tx antenna path diversity support
      - firmware crash information via devcoredump

   - Qualcomm WiFi (wcn36xx)
      - Wake-on-WLAN support with magic packets and GTK rekeying

   - Micrel PHY (ksz886x/ksz8081): add cable test support"

* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
  tcp: change ICSK_CA_PRIV_SIZE definition
  tcp_yeah: check struct yeah size at compile time
  gve: DQO: Fix off by one in gve_rx_dqo()
  stmmac: intel: set PCI_D3hot in suspend
  stmmac: intel: Enable PHY WOL option in EHL
  net: stmmac: option to enable PHY WOL with PMT enabled
  net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
  net: use netdev_info in ndo_dflt_fdb_{add,del}
  ptp: Set lookup cookie when creating a PTP PPS source.
  net: sock: add trace for socket errors
  net: sock: introduce sk_error_report
  net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
  net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
  net: dsa: include fdb entries pointing to bridge in the host fdb list
  net: dsa: include bridge addresses which are local in the host fdb list
  net: dsa: sync static FDB entries on foreign interfaces to hardware
  net: dsa: install the host MDB and FDB entries in the master's RX filter
  net: dsa: reference count the FDB addresses at the cross-chip notifier level
  net: dsa: introduce a separate cross-chip notifier type for host FDBs
  net: dsa: reference count the MDB entries at the cross-chip notifier level
  ...
parents a6eaf385 b6df0078
Loading
Loading
Loading
Loading
+78 −0
Original line number Diff line number Diff line
What:		/sys/devices/platform/soc@X/XXXXXXX.ipa/
Date:		June 2021
KernelVersion:	v5.14
Contact:	Alex Elder <elder@kernel.org>
Description:
		The /sys/devices/platform/soc@X/XXXXXXX.ipa/ directory
		contains read-only attributes exposing information about
		an IPA device.  The X values could vary, but are typically
		"soc@0/1e40000.ipa".

What:		.../XXXXXXX.ipa/version
Date:		June 2021
KernelVersion:	v5.14
Contact:	Alex Elder <elder@kernel.org>
Description:
		The .../XXXXXXX.ipa/version file contains the IPA hardware
		version, as a period-separated set of two or three integers
		(e.g., "3.5.1" or "4.2").

What:		.../XXXXXXX.ipa/feature/
Date:		June 2021
KernelVersion:	v5.14
Contact:	Alex Elder <elder@kernel.org>
Description:
		The .../XXXXXXX.ipa/feature/ directory contains a set of
		attributes describing features implemented by the IPA
		hardware.

What:		.../XXXXXXX.ipa/feature/rx_offload
Date:		June 2021
KernelVersion:	v5.14
Contact:	Alex Elder <elder@kernel.org>
Description:
		The .../XXXXXXX.ipa/feature/rx_offload file contains a
		string indicating the type of receive checksum offload
		that is supported by the hardware.  The possible values
		are "MAPv4" or "MAPv5".

What:		.../XXXXXXX.ipa/feature/tx_offload
Date:		June 2021
KernelVersion:	v5.14
Contact:	Alex Elder <elder@kernel.org>
Description:
		The .../XXXXXXX.ipa/feature/tx_offload file contains a
		string indicating the type of transmit checksum offload
		that is supported by the hardware.  The possible values
		are "MAPv4" or "MAPv5".

What:		.../XXXXXXX.ipa/modem/
Date:		June 2021
KernelVersion:	v5.14
Contact:	Alex Elder <elder@kernel.org>
Description:
		The .../XXXXXXX.ipa/modem/ directory contains a set of
		attributes describing properties of the modem execution
		environment reachable by the IPA hardware.

What:		.../XXXXXXX.ipa/modem/rx_endpoint_id
Date:		June 2021
KernelVersion:	v5.14
Contact:	Alex Elder <elder@kernel.org>
Description:
		The .../XXXXXXX.ipa/feature/rx_endpoint_id file contains
		the AP endpoint ID that receives packets originating from
		the modem execution environment.  The "rx" is from the
		perspective of the AP; this endpoint is considered an "IPA
		producer".  An endpoint ID is a small unsigned integer.

What:		.../XXXXXXX.ipa/modem/tx_endpoint_id
Date:		June 2021
KernelVersion:	v5.14
Contact:	Alex Elder <elder@kernel.org>
Description:
		The .../XXXXXXX.ipa/feature/tx_endpoint_id file contains
		the AP endpoint ID used to transmit packets destined for
		the modem execution environment.  The "tx" is from the
		perspective of the AP; this endpoint is considered an "IPA
		consumer".  An endpoint ID is a small unsigned integer.
+34 −21
Original line number Diff line number Diff line
@@ -211,27 +211,40 @@ over a rather long period of time, but improvements are always welcome!
	of the system, especially to real-time workloads running on
	the rest of the system.

7.	As of v4.20, a given kernel implements only one RCU flavor,
	which is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y.
	If the updater uses call_rcu() or synchronize_rcu(),
	then the corresponding readers may use rcu_read_lock() and
	rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
	or any pair of primitives that disables and re-enables preemption,
	for example, rcu_read_lock_sched() and rcu_read_unlock_sched().
	If the updater uses synchronize_srcu() or call_srcu(),
	then the corresponding readers must use srcu_read_lock() and
	srcu_read_unlock(), and with the same srcu_struct.  The rules for
	the expedited primitives are the same as for their non-expedited
	counterparts.  Mixing things up will result in confusion and
	broken kernels, and has even resulted in an exploitable security
	issue.

	One exception to this rule: rcu_read_lock() and rcu_read_unlock()
	may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
	in cases where local bottom halves are already known to be
	disabled, for example, in irq or softirq context.  Commenting
	such cases is a must, of course!  And the jury is still out on
	whether the increased speed is worth it.
7.	As of v4.20, a given kernel implements only one RCU flavor, which
	is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y.
	If the updater uses call_rcu() or synchronize_rcu(), then
	the corresponding readers may use:  (1) rcu_read_lock() and
	rcu_read_unlock(), (2) any pair of primitives that disables
	and re-enables softirq, for example, rcu_read_lock_bh() and
	rcu_read_unlock_bh(), or (3) any pair of primitives that disables
	and re-enables preemption, for example, rcu_read_lock_sched() and
	rcu_read_unlock_sched().  If the updater uses synchronize_srcu()
	or call_srcu(), then the corresponding readers must use
	srcu_read_lock() and srcu_read_unlock(), and with the same
	srcu_struct.  The rules for the expedited RCU grace-period-wait
	primitives are the same as for their non-expedited counterparts.

	If the updater uses call_rcu_tasks() or synchronize_rcu_tasks(),
	then the readers must refrain from executing voluntary
	context switches, that is, from blocking.  If the updater uses
	call_rcu_tasks_trace() or synchronize_rcu_tasks_trace(), then
	the corresponding readers must use rcu_read_lock_trace() and
	rcu_read_unlock_trace().  If an updater uses call_rcu_tasks_rude()
	or synchronize_rcu_tasks_rude(), then the corresponding readers
	must use anything that disables interrupts.

	Mixing things up will result in confusion and broken kernels, and
	has even resulted in an exploitable security issue.  Therefore,
	when using non-obvious pairs of primitives, commenting is
	of course a must.  One example of non-obvious pairing is
	the XDP feature in networking, which calls BPF programs from
	network-driver NAPI (softirq) context.	BPF relies heavily on RCU
	protection for its data structures, but because the BPF program
	invocation happens entirely within a single local_bh_disable()
	section in a NAPI poll cycle, this usage is safe.  The reason
	that this usage is safe is that readers can use anything that
	disables BH when updaters use call_rcu() or synchronize_rcu().

8.	Although synchronize_rcu() is slower than is call_rcu(), it
	usually results in simpler code.  So, unless update performance is
+14 −0
Original line number Diff line number Diff line
@@ -12,6 +12,19 @@ BPF instruction-set.
The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture.

libbpf
======

Libbpf is a userspace library for loading and interacting with bpf programs.

.. toctree::
   :maxdepth: 1

   libbpf/libbpf
   libbpf/libbpf_api
   libbpf/libbpf_build
   libbpf/libbpf_naming_convention

BPF Type Format (BTF)
=====================

@@ -84,6 +97,7 @@ Other
   :maxdepth: 1

   ringbuf
   llvm_reloc

.. Links:
.. _networking-filter: ../networking/filter.rst
+14 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)

libbpf
======

This is documentation for libbpf, a userspace library for loading and
interacting with bpf programs.

All general BPF questions, including kernel functionality, libbpf APIs and
their application, should be sent to bpf@vger.kernel.org mailing list.
You can `subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the
mailing list search its `archive <https://lore.kernel.org/bpf/>`_.
Please search the archive before asking new questions. It very well might
be that this was already addressed or answered before.
+27 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)

API
===

This documentation is autogenerated from header files in libbpf, tools/lib/bpf

.. kernel-doc:: tools/lib/bpf/libbpf.h
   :internal:

.. kernel-doc:: tools/lib/bpf/bpf.h
   :internal:

.. kernel-doc:: tools/lib/bpf/btf.h
   :internal:

.. kernel-doc:: tools/lib/bpf/xsk.h
   :internal:

.. kernel-doc:: tools/lib/bpf/bpf_tracing.h
   :internal:

.. kernel-doc:: tools/lib/bpf/bpf_core_read.h
   :internal:

.. kernel-doc:: tools/lib/bpf/bpf_endian.h
   :internal:
 No newline at end of file
Loading