Skip to content
  1. Aug 27, 2023
  2. Aug 26, 2023
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · b32add2d
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-08-24 (igc, e1000e)
      
      This series contains updates to igc and e1000e drivers.
      
      Vinicius adds support for utilizing multiple PTP registers on igc.
      
      Sasha reduces interval time for PTM on igc and adds new device support
      on e1000e.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
        e1000e: Add support for the next LOM generation
        igc: Decrease PTM short interval from 10 us to 1 us
        igc: Add support for multiple in-flight TX timestamps
      ====================
      
      Link: https://lore.kernel.org/r/20230824204418.1551093-1-anthony.l.nguyen@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b32add2d
    • Donald Hunter's avatar
      doc/netlink: Add delete operation to ovs_vport spec · 52d08fda
      Donald Hunter authored
      
      
      Add del operation to the spec to help with testing.
      
      Signed-off-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/r/20230824142221.71339-1-donald.hunter@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52d08fda
    • Jakub Kicinski's avatar
      tools: ynl-gen: fix uAPI generation after tempfile changes · a02430c0
      Jakub Kicinski authored
      We use a tempfile for code generation, to avoid wiping the target
      file out if the code generator crashes. File contents are copied
      from tempfile to actual destination at the end of main().
      
      uAPI generation is relatively simple so when generating the uAPI
      header we return from main() early, and never reach the "copy code
      over" stage. Since commit under Fixes uAPI headers are not updated
      by ynl-gen.
      
      Move the copy/commit of the code into CodeWriter, to make it
      easier to call at any point in time. Hook it into the destructor
      to make sure we don't miss calling it.
      
      Fixes: f65f305a ("tools: ynl-gen: use temporary file for rendering")
      Link: https://lore.kernel.org/r/20230824212431.1683612-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a02430c0
    • Jakub Kicinski's avatar
      Merge branch 'stmmac-cleanups' · f5e17b47
      Jakub Kicinski authored
      Russell King says:
      
      ====================
      stmmac cleanups
      
      One of the comments I had on Feiyang Chen's series was concerning the
      initialisation of phylink... and so I've decided to do something about
      it, cleaning it up a bit.
      
      This series:
      
      1) adds a new phylink function to limit the MAC capabilities according
         to a maximum speed. This allows us to greatly simplify stmmac's
         initialisation of phylink's mac capabilities.
      
      2) everywhere that uses priv->plat->phylink_node first converts this
         to a fwnode before doing anything with it. This is silly. Let's
         instead store it as a fwnode to eliminate these conversions in
         multiple places.
      
      3) clean up passing the fwnode to phylink - it might as well happen
         at the phylink_create() callsite, rather than being scattered
         throughout the entire function.
      
      4) same for mdio_bus_data
      
      5) use phylink_limit_mac_speed() to handle the priv->plat->max_speed
         restriction.
      
      6) add a method to get the MAC-specific capabilities from the code
         dealing with the MACs, and arrange to call it at an appropriate
         time.
      
      7) convert the gmac4 users to use the MAC specific method.
      
      8) same for xgmac.
      
      9) group all the simple phylink_config initialisations together.
      
      10) convert half-duplex logic to being positive logic.
      
      While looking into all of this, this raised eyebrows:
      
              if (priv->plat->tx_queues_to_use > 1)
                      priv->phylink_config.mac_capabilities &=
                              ~(MAC_10HD | MAC_100HD | MAC_1000HD);
      
      priv->plat->tx_queues_to_use is initialised by platforms to either 1,
      4 or 8, and can be controlled from userspace via the --set-channels
      ethtool op. The implementation of this op in this driver limits the
      number of channels to priv->dma_cap.number_tx_queues, which is derived
      from the DMA hwcap.
      
      So, the obvious questions are:
      
      1) what guarantees that the static initialisation of tx_queues_to_use
      will always be less than or equal to number_tx_queues from the DMA hw
      cap?
      
      2) tx_queues_to_use starts off as 1, but number_tx_queues is larger,
      we will leave the half-duplex capabilities in place, but userspace can
      increase tx_queues_to_use above 1. Does that mean half-duplex is then
      not supported?
      
      3) Should we be basing the decision whether half-duplex is supported
      off the DMA capabilities?
      
      4) What about priv->dma_cap.half_duplex? Doesn't that get a say in
      whether half-duplex is supported or not? Why isn't this used? Why is
      it only reported via debugfs? If it's not being used by the driver,
      what's the point of reporting it via debugfs?
      ====================
      
      Link: https://lore.kernel.org/r/ZOddFH22PWmOmbT5@shell.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f5e17b47
    • Russell King (Oracle)'s avatar
      net: stmmac: convert half-duplex support to positive logic · 76649fc9
      Russell King (Oracle) authored
      
      
      Rather than detecting when half-duplex is not supported, and clearing
      the MAC capabilities, reverse the if() condition and use it to set the
      capabilities instead.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAXn-005pUb-SP@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      76649fc9
    • Russell King (Oracle)'s avatar
      net: stmmac: move priv->phylink_config.mac_managed_pm · 64961f1b
      Russell King (Oracle) authored
      
      
      Move priv->phylink_config.mac_managed_pm to be along side the other
      phylink initialisations.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAXi-005pUV-Nq@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      64961f1b
    • Russell King (Oracle)'s avatar
      net: stmmac: move xgmac specific phylink caps to dwxgmac2 core · bedf9b81
      Russell King (Oracle) authored
      
      
      Move the xgmac specific phylink capabilities to the dwxgmac2 support
      core.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAXd-005pUP-JL@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bedf9b81
    • Russell King (Oracle)'s avatar
      net: stmmac: move gmac4 specific phylink capabilities to gmac4 · f1dae3d2
      Russell King (Oracle) authored
      
      
      Move the setup of gmac4 speicifc phylink capabilities into gmac4 code.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAXY-005pUJ-Ez@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f1dae3d2
    • Russell King (Oracle)'s avatar
      net: stmmac: provide stmmac_mac_phylink_get_caps() · d42ca04e
      Russell King (Oracle) authored
      
      
      Allow MACs to provide their own capabilities via the MAC operations
      struct.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAXT-005pUD-Aj@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d42ca04e
    • Russell King (Oracle)'s avatar
      net: stmmac: use phylink_limit_mac_speed() · a4ac612b
      Russell King (Oracle) authored
      
      
      Use phylink_limit_mac_speed() to limit the MAC capabilities rather
      than coding this for each speed.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAXO-005pU7-61@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a4ac612b
    • Russell King (Oracle)'s avatar
      net: stmmac: use "mdio_bus_data" local variable · 2b070cdd
      Russell King (Oracle) authored
      
      
      We have a local variable for priv->plat->mdio_bus_data, which we use
      later in the conditional if() block, but we evaluate the above within
      the conditional expression. Use mdio_bus_data instead. Since these
      will be the only two users of this local variable, move its assignment
      just before the if().
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAXJ-005pU1-1z@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b070cdd
    • Russell King (Oracle)'s avatar
      net: stmmac: clean up passing fwnode to phylink · 1a37c1c1
      Russell King (Oracle) authored
      
      
      Move the initialisation of the fwnode variable closer to its use
      site, rather than scattered throughout stmmac_phy_setup().
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAXD-005pTv-TN@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1a37c1c1
    • Russell King (Oracle)'s avatar
      net: stmmac: convert plat->phylink_node to fwnode · e80af2ac
      Russell King (Oracle) authored
      
      
      All users of plat->phylink_node first convert it to a fwnode. Rather
      than repeatedly convert to a fwnode, store it as a fwnode. To reflect
      this change, call it plat->port_node instead - it is used for more
      than just phylink.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAX8-005pTo-OT@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e80af2ac
    • Russell King (Oracle)'s avatar
      net: phylink: add phylink_limit_mac_speed() · 70934c7c
      Russell King (Oracle) authored
      
      
      Add a function which can be used to limit the phylink MAC capabilities
      to an upper speed limit.
      
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1qZAX3-005pTi-K1@rmk-PC.armlinux.org.uk
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      70934c7c
    • Liang Chen's avatar
      veth: Avoid NAPI scheduling on failed SKB forwarding · 215eb9f9
      Liang Chen authored
      
      
      When an skb fails to be forwarded to the peer(e.g., skb data buffer
      length exceeds MTU), it will not be added to the peer's receive queue.
      Therefore, we should schedule the peer's NAPI poll function only when
      skb forwarding is successful to avoid unnecessary overhead.
      
      Signed-off-by: default avatarLiang Chen <liangchen.linux@gmail.com>
      Link: https://lore.kernel.org/r/20230824123131.7673-1-liangchen.linux@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      215eb9f9
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · bebfbf07
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2023-08-25
      
      We've added 87 non-merge commits during the last 8 day(s) which contain
      a total of 104 files changed, 3719 insertions(+), 4212 deletions(-).
      
      The main changes are:
      
      1) Add multi uprobe BPF links for attaching multiple uprobes
         and usdt probes, which is significantly faster and saves extra fds,
         from Jiri Olsa.
      
      2) Add support BPF cpu v4 instructions for arm64 JIT compiler,
         from Xu Kuohai.
      
      3) Add support BPF cpu v4 instructions for riscv64 JIT compiler,
         from Pu Lehui.
      
      4) Fix LWT BPF xmit hooks wrt their return values where propagating
         the result from skb_do_redirect() would trigger a use-after-free,
         from Yan Zhai.
      
      5) Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr
         where the map's value kptr type and locally allocated obj type
         mismatch, from Yonghong Song.
      
      6) Fix BPF verifier's check_func_arg_reg_off() function wrt graph
         root/node which bypassed reg->off == 0 enforcement,
         from Kumar Kartikeya Dwivedi.
      
      7) Lift BPF verifier restriction in networking BPF programs to treat
         comparison of packet pointers not as a pointer leak,
         from Yafang Shao.
      
      8) Remove unmaintained XDP BPF samples as they are maintained
         in xdp-tools repository out of tree, from Toke Høiland-Jørgensen.
      
      9) Batch of fixes for the tracing programs from BPF samples in order
         to make them more libbpf-aware, from Daniel T. Lee.
      
      10) Fix a libbpf signedness determination bug in the CO-RE relocation
          handling logic, from Andrii Nakryiko.
      
      11) Extend libbpf to support CO-RE kfunc relocations. Also follow-up
          fixes for bpf_refcount shared ownership implementation,
          both from Dave Marchevsky.
      
      12) Add a new bpf_object__unpin() API function to libbpf,
          from Daniel Xu.
      
      13) Fix a memory leak in libbpf to also free btf_vmlinux
          when the bpf_object gets closed, from Hao Luo.
      
      14) Small error output improvements to test_bpf module, from Helge Deller.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (87 commits)
        selftests/bpf: Add tests for rbtree API interaction in sleepable progs
        bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
        bpf: Consider non-owning refs to refcounted nodes RCU protected
        bpf: Reenable bpf_refcount_acquire
        bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
        bpf: Consider non-owning refs trusted
        bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
        selftests/bpf: Enable cpu v4 tests for RV64
        riscv, bpf: Support unconditional bswap insn
        riscv, bpf: Support signed div/mod insns
        riscv, bpf: Support 32-bit offset jmp insn
        riscv, bpf: Support sign-extension mov insns
        riscv, bpf: Support sign-extension load insns
        riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W
        samples/bpf: Add note to README about the XDP utilities moved to xdp-tools
        samples/bpf: Cleanup .gitignore
        samples/bpf: Remove the xdp_sample_pkts utility
        samples/bpf: Remove the xdp1 and xdp2 utilities
        samples/bpf: Remove the xdp_rxq_info utility
        samples/bpf: Remove the xdp_redirect* utilities
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20230825194319.12727-1-daniel@iogearbox.net
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bebfbf07
    • Jakub Kicinski's avatar
      Merge tag 'wireless-next-2023-08-25' of... · 1fa6ffad
      Jakub Kicinski authored
      Merge tag 'wireless-next-2023-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
      
      Kalle Valo says:
      
      ====================
      wireless-next patches for v6.6
      
      The second pull request for v6.6, this time with both stack and driver
      changes. Unusually we have only one major new feature but lots of
      small cleanup all over, I guess this is due to people have been on
      vacation the last month.
      
      Major changes:
      
      rtw89
       - Introduce Time Averaged SAR (TAS) support
      
      * tag 'wireless-next-2023-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (114 commits)
        wifi: rtlwifi: rtl8723: Remove unused function rtl8723_cmd_send_packet()
        wifi: rtw88: usb: kill and free rx urbs on probe failure
        wifi: rtw89: Fix clang -Wimplicit-fallthrough in rtw89_query_sar()
        wifi: rtw89: phy: modify register setting of ENV_MNTR, PHYSTS and DIG
        wifi: rtw89: phy: add phy_gen_def::cr_base to support WiFi 7 chips
        wifi: rtw89: mac: define register address of rx_filter to generalize code
        wifi: rtw89: mac: define internal memory address for WiFi 7 chip
        wifi: rtw89: mac: generalize code to indirectly access WiFi internal memory
        wifi: rtw89: mac: add mac_gen_def::band1_offset to map MAC band1 register address
        wifi: wlcore: sdio: Use module_sdio_driver macro to simplify the code
        wifi: rtw89: initialize multi-channel handling
        wifi: rtw89: provide functions to configure NoA for beacon update
        wifi: rtw89: call rtw89_chan_get() by vif chanctx if aware of vif
        wifi: rtw89: sar: let caller decide the center frequency to query
        wifi: rtw89: refine rtw89_correct_cck_chan() by rtw89_hw_to_nl80211_band()
        wifi: rtw89: add function prototype for coex request duration
        Fix nomenclature for USB and PCI wireless devices
        wifi: ath: Use is_multicast_ether_addr() to check multicast Ether address
        wifi: ath12k: Remove unused declarations
        wifi: ath12k: add check max message length while scanning with extraie
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20230825132230.A0833C433C8@smtp.kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1fa6ffad
    • Jakub Kicinski's avatar
      Merge tag 'for-net-next-2023-08-24' of... · 3db34747
      Jakub Kicinski authored
      Merge tag 'for-net-next-2023-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth-next pull request for net-next:
      
       - Introduce HCI_QUIRK_BROKEN_LE_CODED
       - Add support for PA/BIG sync
       - Add support for NXP IW624 chipset
       - Add support for Qualcomm WCN7850
      
      * tag 'for-net-next-2023-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next:
        Bluetooth: btusb: Do not call kfree_skb() under spin_lock_irqsave()
        Bluetooth: btusb: Fix quirks table naming
        Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED
        Bluetooth: btintel: Send new command for PPAG
        Bluetooth: ISO: Add support for periodic adv reports processing
        Bluetooth: hci_conn: fail SCO/ISO via hci_conn_failed if ACL gone early
        Bluetooth: hci_core: Fix missing instances using HCI_MAX_AD_LENGTH
        Bluetooth: ISO: Use defer setup to separate PA sync and BIG sync
        Bluetooth: qca: add support for WCN7850
        Bluetooth: qca: use switch case for soc type behavior
        dt-bindings: net: bluetooth: qualcomm: document WCN7850 chipset
        Bluetooth: hci_conn: Fix sending BT_HCI_CMD_LE_CREATE_CONN_CANCEL
        Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync
        Bluetooth: btnxpuart: Improve inband Independent Reset handling
        Bluetooth: btnxpuart: Add support for IW624 chipset
        Bluetooth: btnxpuart: Remove check for CTS low after FW download
      ====================
      
      Link: https://lore.kernel.org/r/20230824201458.2577-1-luiz.dentz@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3db34747
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-refcount-followups-3-bpf_mem_free_rcu-refcounted-nodes' · ec0ded2e
      Alexei Starovoitov authored
      Dave Marchevsky says:
      
      ====================
      BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
      
      This series is the third of three (or more) followups to address issues
      in the bpf_refcount shared ownership implementation discovered by Kumar.
      This series addresses the use-after-free scenario described in [0]. The
      first followup series ([1]) also attempted to address the same
      use-after-free, but only got rid of the splat without addressing the
      underlying issue. After this series the underyling issue is fixed and
      bpf_refcount_acquire can be re-enabled.
      
      The main fix here is migration of bpf_obj_drop to use
      bpf_mem_free_rcu. To understand why this fixes the issue, let us consider
      the example interleaving provided by Kumar in [0]:
      
      CPU 0                                   CPU 1
      n = bpf_obj_new
      lock(lock1)
      bpf_rbtree_add(rbtree1, n)
      m = bpf_rbtree_acquire(n)
      unlock(lock1)
      
      kptr_xchg(map, m) // move to map
      // at this point, refcount = 2
      					m = kptr_xchg(map, NULL)
      					lock(lock2)
      lock(lock1)				bpf_rbtree_add(rbtree2, m)
      p = bpf_rbtree_first(rbtree1)			if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A
      bpf_rbtree_remove(rbtree1, p)
      unlock(lock1)
      bpf_obj_drop(p) // B
      					bpf_refcount_acquire(m) // use-after-free
      					...
      
      Before this series, bpf_obj_drop returns memory to the allocator using
      bpf_mem_free. At this point (B in the example) there might be some
      non-owning references to that memory which the verifier believes are valid,
      but where the underlying memory was reused for some other allocation.
      Commit 7793fc3b ("bpf: Make bpf_refcount_acquire fallible for
      non-owning refs") attempted to fix this by doing refcount_inc_non_zero
      on refcount_acquire in instead of refcount_inc under the assumption that
      preventing erroneous incr-on-0 would be sufficient. This isn't true,
      though: refcount_inc_non_zero must *check* if the refcount is zero, and
      the memory it's checking could have been reused, so the check may look
      at and incr random reused bytes.
      
      If we wait to reuse this memory until all non-owning refs that could
      point to it are gone, there is no possibility of this scenario
      happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted
      nodes accomplishes this.
      
      For such nodes, the validity of their underlying memory is now tied to
      RCU critical section. This matches MEM_RCU trustedness
      expectations, so the series takes the opportunity to more explicitly
      mark this trustedness state.
      
      The functional effects of trustedness changes here are rather small.
      This is largely due to local kptrs having separate verifier handling -
      with implicit trustedness assumptions - than arbitrary kptrs.
      Regardless, let's take the opportunity to move towards a world where
      trustedness is more explicitly handled.
      
      Changelog:
      
      v1 -> v2: https://lore.kernel.org/bpf/20230801203630.3581291-1-davemarchevsky@fb.com/
      
      Patch 1 ("bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire")
        * Spent some time experimenting with a better approach as per convo w/
          Yonghong on v1's patch. It started getting too complex, so left unchanged
          for now. Yonghong was fine with this approach being shipped.
      
      Patch 2 ("bpf: Consider non-owning refs trusted")
        * Add Yonghong ack
      Patch 3 ("bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes")
        * Add Yonghong ack
      Patch 4 ("bpf: Reenable bpf_refcount_acquire")
        * Add Yonghong ack
      
      Patch 5 ("bpf: Consider non-owning refs to refcounted nodes RCU protected")
        * Undo a nonfunctional whitespace change that shouldn't have been included
          (Yonghong)
        * Better logging message when complaining about rcu_read_{lock,unlock} in
          rbtree cb (Alexei)
        * Don't invalidate_non_owning_refs when processing bpf_rcu_read_unlock
          (Yonghong, Alexei)
      
      Patch 6 ("[RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS")
        * preempt_{disable,enable} in __bpf_spin_{lock,unlock} (Alexei)
          * Due to this we can consider spin_lock CS an RCU-sched read-side CS (per
            RCU/Design/Requirements/Requirements.rst). Modify in_rcu_cs accordingly.
        * no need to check for !in_rcu_cs before allowing bpf_spin_{lock,unlock}
          (Alexei)
        * RFC tag removed and renamed to "bpf: Allow bpf_spin_{lock,unlock} in
          sleepable progs"
      
      Patch 7 ("selftests/bpf: Add tests for rbtree API interaction in sleepable progs")
        * Remove "no explicit bpf_rcu_read_lock" failure test, add similar success
          test (Alexei)
      
      Summary of patch contents, with sub-bullets being leading questions and
      comments I think are worth reviewer attention:
      
        * Patches 1 and 2 are moreso documententation - and
          enforcement, in patch 1's case - of existing semantics / expectations
      
        * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that
          their underlying memory is not reused until RCU grace period elapses
          * Perhaps it makes sense to move to mem_free_rcu for _all_
            non-owning refs in the future, not just refcounted. This might
            allow custom non-owning ref lifetime + invalidation logic to be
            entirely subsumed by MEM_RCU handling. IMO this needs a bit more
            thought and should be tackled outside of a fix series, so it's not
            attempted here.
      
        * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix
          the remaining use-after-free
          * One might expect this patch to be last in the series, or last
            before selftest changes. Patches 5 and 6 don't change
            verification or runtime behavior for existing BPF progs, though.
      
        * Patch 5 brings the verifier's understanding of refcounted node
          trustedness in line with Patch 4's changes
      
        * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable
          progs. Marked RFC for a few reasons:
          * bpf_spin_{lock,unlock} haven't been usable in sleepable progs
            since before the introduction of bpf linked list and rbtree. As
            such this feels more like a new feature that may not belong in
            this fixes series.
      
        * Patch 7 adds tests
      
        [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
        [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/
      ====================
      
      Link: https://lore.kernel.org/r/20230821193311.3290257-1-davemarchevsky@fb.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ec0ded2e
    • Dave Marchevsky's avatar
      selftests/bpf: Add tests for rbtree API interaction in sleepable progs · 312aa5bd
      Dave Marchevsky authored
      
      
      Confirm that the following sleepable prog states fail verification:
        * bpf_rcu_read_unlock before bpf_spin_unlock
           * RCU CS will last at least as long as spin_lock CS
      
      Also confirm that correct usage passes verification, specifically:
        * Explicit use of bpf_rcu_read_{lock, unlock} in sleepable test prog
        * Implied RCU CS due to spin_lock CS
      
      None of the selftest progs actually attach to bpf_testmod's
      bpf_testmod_test_read.
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230821193311.3290257-8-davemarchevsky@fb.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      312aa5bd
    • Dave Marchevsky's avatar
      bpf: Allow bpf_spin_{lock,unlock} in sleepable progs · 5861d1e8
      Dave Marchevsky authored
      Commit 9e7a4d98
      
       ("bpf: Allow LSM programs to use bpf spin locks")
      disabled bpf_spin_lock usage in sleepable progs, stating:
      
       Sleepable LSM programs can be preempted which means that allowng spin
       locks will need more work (disabling preemption and the verifier
       ensuring that no sleepable helpers are called when a spin lock is
       held).
      
      This patch disables preemption before grabbing bpf_spin_lock. The second
      requirement above "no sleepable helpers are called when a spin lock is
      held" is implicitly enforced by current verifier logic due to helper
      calls in spin_lock CS being disabled except for a few exceptions, none
      of which sleep.
      
      Due to above preemption changes, bpf_spin_lock CS can also be considered
      a RCU CS, so verifier's in_rcu_cs check is modified to account for this.
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230821193311.3290257-7-davemarchevsky@fb.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5861d1e8
    • Dave Marchevsky's avatar
      bpf: Consider non-owning refs to refcounted nodes RCU protected · 0816b8c6
      Dave Marchevsky authored
      
      
      An earlier patch in the series ensures that the underlying memory of
      nodes with bpf_refcount - which can have multiple owners - is not reused
      until RCU grace period has elapsed. This prevents
      use-after-free with non-owning references that may point to
      recently-freed memory. While RCU read lock is held, it's safe to
      dereference such a non-owning ref, as by definition RCU GP couldn't have
      elapsed and therefore underlying memory couldn't have been reused.
      
      From the perspective of verifier "trustedness" non-owning refs to
      refcounted nodes are now trusted only in RCU CS and therefore should no
      longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
      MEM_RCU in order to reflect this new state.
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230821193311.3290257-6-davemarchevsky@fb.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0816b8c6
    • Dave Marchevsky's avatar
      bpf: Reenable bpf_refcount_acquire · ba2464c8
      Dave Marchevsky authored
      Now that all reported issues are fixed, bpf_refcount_acquire can be
      turned back on. Also reenable all bpf_refcount-related tests which were
      disabled.
      
      This a revert of:
       * commit f3514a5d ("selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled")
       * commit 7deca5ea
      
       ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed")
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230821193311.3290257-5-davemarchevsky@fb.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ba2464c8
    • Dave Marchevsky's avatar
      bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes · 7e26cd12
      Dave Marchevsky authored
      This is the final fix for the use-after-free scenario described in
      commit 7793fc3b
      
       ("bpf: Make bpf_refcount_acquire fallible for
      non-owning refs"). That commit, by virtue of changing
      bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
      the "refcount incr on 0" splat. The not_zero check in
      refcount_inc_not_zero, though, still occurs on memory that could have
      been free'd and reused, so the commit didn't properly fix the root
      cause.
      
      This patch actually fixes the issue by free'ing using the recently-added
      bpf_mem_free_rcu, which ensures that the memory is not reused until
      RCU grace period has elapsed. If that has happened then
      there are no non-owning references alive that point to the
      recently-free'd memory, so it can be safely reused.
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230821193311.3290257-4-davemarchevsky@fb.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7e26cd12
    • Dave Marchevsky's avatar
      bpf: Consider non-owning refs trusted · 2a6d50b5
      Dave Marchevsky authored
      Recent discussions around default kptr "trustedness" led to changes such
      as commit 6fcd486b ("bpf: Refactor RCU enforcement in the
      verifier."). One of the conclusions of those discussions, as expressed
      in code and comments in that patch, is that we'd like to move away from
      'raw' PTR_TO_BTF_ID without some type flag or other register state
      indicating trustedness. Although PTR_TRUSTED and PTR_UNTRUSTED flags mark
      this state explicitly, the verifier currently considers trustedness
      implied by other register state. For example, owning refs to graph
      collection nodes must have a nonzero ref_obj_id, so they pass the
      is_trusted_reg check despite having no explicit PTR_{UN}TRUSTED flag.
      This patch makes trustedness of non-owning refs to graph collection
      nodes explicit as well.
      
      By definition, non-owning refs are currently trusted. Although the ref
      has no control over pointee lifetime, due to non-owning ref clobbering
      rules (see invalidate_non_owning_refs) dereferencing a non-owning ref is
      safe in the critical section controlled by bpf_spin_lock associated with
      its owning collection.
      
      Note that the previous statement does not hold true for nodes with shared
      ownership due to the use-after-free issue that this series is
      addressing. True shared ownership was disabled by commit 7deca5ea
      
      
      ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"),
      though, so the statement holds for now. Further patches in the series will change
      the trustedness state of non-owning refs before re-enabling
      bpf_refcount_acquire.
      
      Let's add NON_OWN_REF type flag to BPF_REG_TRUSTED_MODIFIERS such that a
      non-owning ref reg state would pass is_trusted_reg check. Somewhat
      surprisingly, this doesn't result in any change to user-visible
      functionality elsewhere in the verifier: graph collection nodes are all
      marked MEM_ALLOC, which tends to be handled in separate codepaths from
      "raw" PTR_TO_BTF_ID. Regardless, let's be explicit here and document the
      current state of things before changing it elsewhere in the series.
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230821193311.3290257-3-davemarchevsky@fb.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2a6d50b5
    • Dave Marchevsky's avatar
      bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire · f0d991a0
      Dave Marchevsky authored
      It's straightforward to prove that kptr_struct_meta must be non-NULL for
      any valid call to these kfuncs:
      
        * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
          struct in user BTF with a special field (e.g. bpf_refcount,
          {rb,list}_node). These are stored in that BTF's struct_meta_tab.
      
        * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
          have {rb,list}_node field and that it's at the correct offset.
          Similarly, check_kfunc_args ensures bpf_refcount field existence for
          node param to bpf_refcount_acquire.
      
        * So a btf_struct_meta must have been created for the struct type of
          node param to these kfuncs
      
        * That BTF and its struct_meta_tab are guaranteed to still be around.
          Any arbitrary {rb,list} node the BPF program interacts with either:
          came from bpf_obj_new or a collection removal kfunc in the same
          program, in which case the BTF is associated with the program and
          still around; or came from bpf_kptr_xchg, in which case the BTF was
          associated with the map and is still around
      
      Instead of silently continuing with NULL struct_meta, which caused
      confusing bugs such as those addressed by commit 2140a6e3
      
       ("bpf: Set
      kptr_struct_meta for node param to list and rbtree insert funcs"), let's
      error out. Then, at runtime, we can confidently say that the
      implementations of these kfuncs were given a non-NULL kptr_struct_meta,
      meaning that special-field-specific functionality like
      bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
      series are guaranteed to execute.
      
      This patch doesn't change functionality, just makes it easier to reason
      about existing functionality.
      
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230821193311.3290257-2-davemarchevsky@fb.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f0d991a0
  3. Aug 25, 2023