Skip to content
  1. Nov 17, 2023
  2. Nov 16, 2023
    • Linus Torvalds's avatar
      Merge tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 7475e51b
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from BPF and netfilter.
      
        Current release - regressions:
      
         - core: fix undefined behavior in netdev name allocation
      
         - bpf: do not allocate percpu memory at init stage
      
         - netfilter: nf_tables: split async and sync catchall in two
           functions
      
         - mptcp: fix possible NULL pointer dereference on close
      
        Current release - new code bugs:
      
         - eth: ice: dpll: fix initial lock status of dpll
      
        Previous releases - regressions:
      
         - bpf: fix precision backtracking instruction iteration
      
         - af_unix: fix use-after-free in unix_stream_read_actor()
      
         - tipc: fix kernel-infoleak due to uninitialized TLV value
      
         - eth: bonding: stop the device in bond_setup_by_slave()
      
         - eth: mlx5:
            - fix double free of encap_header
            - avoid referencing skb after free-ing in drop path
      
         - eth: hns3: fix VF reset
      
         - eth: mvneta: fix calls to page_pool_get_stats
      
        Previous releases - always broken:
      
         - core: set SOCK_RCU_FREE before inserting socket into hashtable
      
         - bpf: fix control-flow graph checking in privileged mode
      
         - eth: ppp: limit MRU to 64K
      
         - eth: stmmac: avoid rx queue overrun
      
         - eth: icssg-prueth: fix error cleanup on failing initialization
      
         - eth: hns3: fix out-of-bounds access may occur when coalesce info is
           read via debugfs
      
         - eth: cortina: handle large frames
      
        Misc:
      
         - selftests: gso: support CONFIG_MAX_SKB_FRAGS up to 45"
      
      * tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (78 commits)
        macvlan: Don't propagate promisc change to lower dev in passthru
        net: sched: do not offload flows with a helper in act_ct
        net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors
        net/mlx5e: Check return value of snprintf writing to fw_version buffer
        net/mlx5e: Reduce the size of icosq_str
        net/mlx5: Increase size of irq name buffer
        net/mlx5e: Update doorbell for port timestamping CQ before the software counter
        net/mlx5e: Track xmit submission to PTP WQ after populating metadata map
        net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe
        net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload
        net/mlx5e: Fix pedit endianness
        net/mlx5e: fix double free of encap_header in update funcs
        net/mlx5e: fix double free of encap_header
        net/mlx5: Decouple PHC .adjtime and .adjphase implementations
        net/mlx5: DR, Allow old devices to use multi destination FTE
        net/mlx5: Free used cpus mask when an IRQ is released
        Revert "net/mlx5: DR, Supporting inline WQE when possible"
        bpf: Do not allocate percpu memory at init stage
        net: Fix undefined behavior in netdev name allocation
        dt-bindings: net: ethernet-controller: Fix formatting error
        ...
      7475e51b
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.7a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 6eb1acd9
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
      
       - A fix in the Xen events driver avoiding the use of RCU after
         the call to rcu_report_dead() when taking a cpu down
      
       - A fix for running as Xen dom0 to line up ACPI's idea of power
         management capabilities with the one of Xen
      
       - A cleanup eliminating several kernel-doc warnings in Xen related
         code
      
       - A cleanup series of the Xen events driver
      
      * tag 'for-linus-6.7a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/events: remove some info_for_irq() calls in pirq handling
        xen/events: modify internal [un]bind interfaces
        xen/events: drop xen_allocate_irqs_dynamic()
        xen/events: remove some simple helpers from events_base.c
        xen/events: reduce externally visible helper functions
        xen/events: remove unused functions
        xen/events: fix delayed eoi list handling
        xen/shbuf: eliminate 17 kernel-doc warnings
        acpi/processor: sanitize _OSC/_PDC capabilities for Xen dom0
        xen/events: avoid using info_for_irq() in xen_send_IPI_one()
      6eb1acd9
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 372bed5f
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "Bugfixes all over the place"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vhost-vdpa: fix use after free in vhost_vdpa_probe()
        virtio_pci: Switch away from deprecated irq_set_affinity_hint
        riscv, qemu_fw_cfg: Add support for RISC-V architecture
        vdpa_sim_blk: allocate the buffer zeroed
        virtio_pci: move structure to a header
      372bed5f
    • Paolo Abeni's avatar
      Merge tag 'nf-23-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · cff088d9
      Paolo Abeni authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Remove unused variable causing compilation warning in nft_set_rbtree,
         from Yang Li. This unused variable is a left over from previous
         merge window.
      
      2) Possible return of uninitialized in nf_conntrack_bridge, from
         Linkui Xiao. This is there since nf_conntrack_bridge is available.
      
      3) Fix incorrect pointer math in nft_byteorder, from Dan Carpenter.
         Problem has been there since 2016.
      
      4) Fix bogus error in destroy set element command. Problem is there
         since this new destroy command was added.
      
      5) Fix race condition in ipset between swap and destroy commands and
         add/del/test control plane. This problem is there since ipset was
         merged.
      
      6) Split async and sync catchall GC in two function to fix unsafe
         iteration over RCU. This is a fix-for-fix that was included in
         the previous pull request.
      
      * tag 'nf-23-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: split async and sync catchall in two functions
        netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test
        netfilter: nf_tables: bogus ENOENT when destroying element which does not exist
        netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
        netfilter: nf_conntrack_bridge: initialize err to 0
        netfilter: nft_set_rbtree: Remove unused variable nft_net
      ====================
      
      Link: https://lore.kernel.org/r/20231115184514.8965-1-pablo@netfilter.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cff088d9
    • Vlad Buslov's avatar
      macvlan: Don't propagate promisc change to lower dev in passthru · 7e1caeac
      Vlad Buslov authored
      Macvlan device in passthru mode sets its lower device promiscuous mode
      according to its MACVLAN_FLAG_NOPROMISC flag instead of synchronizing it to
      its own promiscuity setting. However, macvlan_change_rx_flags() function
      doesn't check the mode before propagating such changes to the lower device
      which can cause net_device->promiscuity counter overflow as illustrated by
      reproduction example [0] and resulting dmesg log [1]. Fix the issue by
      first verifying the mode in macvlan_change_rx_flags() function before
      propagating promiscuous mode change to the lower device.
      
      [0]:
      ip link add macvlan1 link enp8s0f0 type macvlan mode passthru
      ip link set macvlan1 promisc on
      ip l set dev macvlan1 up
      ip link set macvlan1 promisc off
      ip l set dev macvlan1 down
      ip l set dev macvlan1 up
      
      [1]:
      [ 5156.281724] macvlan1: entered promiscuous mode
      [ 5156.285467] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode
      [ 5156.287639] macvlan1: left promiscuous mode
      [ 5156.288339] mlx5_core 0000:08:00.0 enp8s0f0: left promiscuous mode
      [ 5156.290907] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode
      [ 5156.317197] mlx5_core 0000:08:00.0 enp8s0f0: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.
      
      Fixes: efdbd2b3
      
       ("macvlan: Propagate promiscuity setting to lower devices.")
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20231114175915.1649154-1-vladbu@nvidia.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7e1caeac
    • Xin Long's avatar
      net: sched: do not offload flows with a helper in act_ct · 7cd5af0e
      Xin Long authored
      There is no hardware supporting ct helper offload. However, prior to this
      patch, a flower filter with a helper in the ct action can be successfully
      set into the HW, for example (eth1 is a bnxt NIC):
      
        # tc qdisc add dev eth1 ingress_block 22 ingress
        # tc filter add block 22 proto ip flower skip_sw ip_proto tcp \
          dst_port 21 ct_state -trk action ct helper ipv4-tcp-ftp
        # tc filter show dev eth1 ingress
      
          filter block 22 protocol ip pref 49152 flower chain 0 handle 0x1
            eth_type ipv4
            ip_proto tcp
            dst_port 21
            ct_state -trk
            skip_sw
            in_hw in_hw_count 1   <----
              action order 1: ct zone 0 helper ipv4-tcp-ftp pipe
               index 2 ref 1 bind 1
              used_hw_stats delayed
      
      This might cause the flower filter not to work as expected in the HW.
      
      This patch avoids this problem by simply returning -EOPNOTSUPP in
      tcf_ct_offload_act_setup() to not allow to offload flows with a helper
      in act_ct.
      
      Fixes: a21b06e7
      
       ("net: sched: add helper support in act_ct")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/f8685ec7702c4a448a1371a8b34b43217b583b9d.1699898008.git.lucien.xin@gmail.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7cd5af0e
    • Jakub Kicinski's avatar
      Merge branch 'mlx5-fixes-2023-11-13-manual' · bdc454fc
      Jakub Kicinski authored
      
      
      Saeed Mahameed says:
      
      ====================
      This series provides bug fixes to mlx5 driver.
      ====================
      
      Link: https://lore.kernel.org/r/20231114215846.5902-1-saeed@kernel.org/
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bdc454fc
    • Rahul Rameshbabu's avatar
      net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors · 1b2bd0c0
      Rahul Rameshbabu authored
      Treat the operation as an error case when the return value is equivalent to
      the size of the name buffer. Failed to write null terminator to the name
      buffer, making the string malformed and should not be used. Provide a
      string with only the firmware version when forming the string with the
      board id fails. This logic for representors is identical to normal flow
      with ethtool.
      
      Without check, will trigger -Wformat-truncation with W=1.
      
          drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_rep_get_drvinfo':
          drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:78:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=]
            78 |                  "%d.%d.%04d (%.16s)",
               |                               ^~~~~
          drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:77:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32
            77 |         snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
               |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            78 |                  "%d.%d.%04d (%.16s)",
               |                  ~~~~~~~~~~~~~~~~~~~~~
            79 |                  fw_rev_maj(mdev), fw_rev_min(mdev),
               |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            80 |                  fw_rev_sub(mdev), mdev->board_id);
               |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fixes: cf83c8fd ("net/mlx5e: Add missing ethtool driver info for representors")
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e9
      
      
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-16-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1b2bd0c0
    • Rahul Rameshbabu's avatar
      net/mlx5e: Check return value of snprintf writing to fw_version buffer · 41e63c2b
      Rahul Rameshbabu authored
      Treat the operation as an error case when the return value is equivalent to
      the size of the name buffer. Failed to write null terminator to the name
      buffer, making the string malformed and should not be used. Provide a
      string with only the firmware version when forming the string with the
      board id fails.
      
      Without check, will trigger -Wformat-truncation with W=1.
      
          drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_ethtool_get_drvinfo':
          drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:49:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=]
            49 |                  "%d.%d.%04d (%.16s)",
               |                               ^~~~~
          drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:48:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32
            48 |         snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
               |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            49 |                  "%d.%d.%04d (%.16s)",
               |                  ~~~~~~~~~~~~~~~~~~~~~
            50 |                  fw_rev_maj(mdev), fw_rev_min(mdev), fw_rev_sub(mdev),
               |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            51 |                  mdev->board_id);
               |                  ~~~~~~~~~~~~~~~
      
      Fixes: 84e11edb ("net/mlx5e: Show board id in ethtool driver information")
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e9
      
      
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41e63c2b
    • Saeed Mahameed's avatar
      net/mlx5e: Reduce the size of icosq_str · dce94142
      Saeed Mahameed authored
      icosq_str size is unnecessarily too long, and it causes a build warning
      -Wformat-truncation with W=1. Looking closely, It doesn't need to be 255B,
      hence this patch reduces the size to 32B which should be more than enough
      to host the string: "ICOSQ: 0x%x, ".
      
      While here, add a missing space in the formatted string.
      
      This fixes the following build warning:
      
      $ KCFLAGS='-Wall -Werror'
      $ make O=/tmp/kbuild/linux W=1 -s -j12 drivers/net/ethernet/mellanox/mlx5/core/
      
      drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c: In function 'mlx5e_reporter_rx_timeout':
      drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:718:56:
      error: ', CQ: 0x' directive output may be truncated writing 8 bytes into a region of size between 0 and 255 [-Werror=format-truncation=]
        718 |                  "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
            |                                                        ^~~~~~~~
      drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:717:9: note: 'snprintf' output between 43 and 322 bytes into a destination of size 288
        717 |         snprintf(err_str, sizeof(err_str),
            |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        718 |                  "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
            |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        719 |                  rq->ix, icosq_str, rq->rqn, rq->cq.mcq.cqn);
            |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fixes: 521f31af ("net/mlx5e: Allow RQ outside of channel context")
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e9
      
      
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-14-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dce94142
    • Rahul Rameshbabu's avatar
      net/mlx5: Increase size of irq name buffer · 3338bebf
      Rahul Rameshbabu authored
      Without increased buffer size, will trigger -Wformat-truncation with W=1
      for the snprintf operation writing to the buffer.
      
          drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c: In function 'mlx5_irq_alloc':
          drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:296:7: error: '@pci:' directive output may be truncated writing 5 bytes into a region of size between 1 and 32 [-Werror=format-truncation=]
            296 |    "%s@pci:%s", name, pci_name(dev->pdev));
                |       ^~~~~
          drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:295:2: note: 'snprintf' output 6 or more bytes (assuming 37) into a destination of size 32
            295 |  snprintf(irq->name, MLX5_MAX_IRQ_NAME,
                |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            296 |    "%s@pci:%s", name, pci_name(dev->pdev));
                |    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fixes: ada9f5d0 ("IB/mlx5: Fix eq names to display nicely in /proc/interrupts")
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e9
      
      
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-13-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3338bebf
    • Rahul Rameshbabu's avatar
      net/mlx5e: Update doorbell for port timestamping CQ before the software counter · 92214be5
      Rahul Rameshbabu authored
      Previously, mlx5e_ptp_poll_ts_cq would update the device doorbell with the
      incremented consumer index after the relevant software counters in the
      kernel were updated. In the mlx5e_sq_xmit_wqe context, this would lead to
      either overrunning the device CQ or exceeding the expected software buffer
      size in the device CQ if the device CQ size was greater than the software
      buffer size. Update the relevant software counter only after updating the
      device CQ consumer index in the port timestamping napi_poll context.
      
      Log:
          mlx5_core 0000:08:00.0: cq_err_event_notifier:517:(pid 0): CQ error on CQN 0x487, syndrome 0x1
          mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000487 event=0x04
      
      Fixes: 1880bc4e
      
       ("net/mlx5e: Add TX port timestamp support")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-12-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      92214be5
    • Rahul Rameshbabu's avatar
      net/mlx5e: Track xmit submission to PTP WQ after populating metadata map · 7e3f3ba9
      Rahul Rameshbabu authored
      Ensure the skb is available in metadata mapping to skbs before tracking the
      metadata index for detecting undelivered CQEs. If the metadata index is put
      in the tracking list before putting the skb in the map, the metadata index
      might be used for detecting undelivered CQEs before the relevant skb is
      available in the map, which can lead to a null-ptr-deref.
      
      Log:
          general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN
          KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
          CPU: 0 PID: 1243 Comm: kworker/0:2 Not tainted 6.6.0-rc4+ #108
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
          Workqueue: events mlx5e_rx_dim_work [mlx5_core]
          RIP: 0010:mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core]
          Code: 8c 24 38 cc ff ff 4c 8d 3c c1 4c 89 f9 48 c1 e9 03 42 80 3c 31 00 0f 85 97 0f 00 00 4d 8b 3f 49 8d 7f 28 48 89 f9 48 c1 e9 03 <42> 80 3c 31 00 0f 85 8b 0f 00 00 49 8b 47 28 48 85 c0 0f 84 05 07
          RSP: 0018:ffff8884d3c09c88 EFLAGS: 00010206
          RAX: 0000000000000069 RBX: ffff8881160349d8 RCX: 0000000000000005
          RDX: ffffed10218f48cf RSI: 0000000000000004 RDI: 0000000000000028
          RBP: ffff888122707700 R08: 0000000000000001 R09: ffffed109a781383
          R10: 0000000000000003 R11: 0000000000000003 R12: ffff88810c7a7a40
          R13: ffff888122707700 R14: dffffc0000000000 R15: 0000000000000000
          FS:  0000000000000000(0000) GS:ffff8884d3c00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007f4f878dd6e0 CR3: 000000014d108002 CR4: 0000000000370eb0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
          <IRQ>
          ? die_addr+0x3c/0xa0
          ? exc_general_protection+0x144/0x210
          ? asm_exc_general_protection+0x22/0x30
          ? mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core]
          ? mlx5e_ptp_napi_poll+0x8f6/0x2290 [mlx5_core]
          __napi_poll.constprop.0+0xa4/0x580
          net_rx_action+0x460/0xb80
          ? _raw_spin_unlock_irqrestore+0x32/0x60
          ? __napi_poll.constprop.0+0x580/0x580
          ? tasklet_action_common.isra.0+0x2ef/0x760
          __do_softirq+0x26c/0x827
          irq_exit_rcu+0xc2/0x100
          common_interrupt+0x7f/0xa0
          </IRQ>
          <TASK>
          asm_common_interrupt+0x22/0x40
          RIP: 0010:__kmem_cache_alloc_node+0xb/0x330
          Code: 41 5d 41 5e 41 5f c3 8b 44 24 14 8b 4c 24 10 09 c8 eb d5 e8 b7 43 ca 01 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 <41> 56 41 89 d6 41 55 41 89 f5 41 54 49 89 fc 53 48 83 e4 f0 48 83
          RSP: 0018:ffff88812c4079c0 EFLAGS: 00000246
          RAX: 1ffffffff083c7fe RBX: ffff888100042dc0 RCX: 0000000000000218
          RDX: 00000000ffffffff RSI: 0000000000000dc0 RDI: ffff888100042dc0
          RBP: ffff88812c4079c8 R08: ffffffffa0289f96 R09: ffffed1025880ea9
          R10: ffff888138839f80 R11: 0000000000000002 R12: 0000000000000dc0
          R13: 0000000000000100 R14: 000000000000008c R15: ffff8881271fc450
          ? cmd_exec+0x796/0x2200 [mlx5_core]
          kmalloc_trace+0x26/0xc0
          cmd_exec+0x796/0x2200 [mlx5_core]
          mlx5_cmd_do+0x22/0xc0 [mlx5_core]
          mlx5_cmd_exec+0x17/0x30 [mlx5_core]
          mlx5_core_modify_cq_moderation+0x139/0x1b0 [mlx5_core]
          ? mlx5_add_cq_to_tasklet+0x280/0x280 [mlx5_core]
          ? lockdep_set_lock_cmp_fn+0x190/0x190
          ? process_one_work+0x659/0x1220
          mlx5e_rx_dim_work+0x9d/0x100 [mlx5_core]
          process_one_work+0x730/0x1220
          ? lockdep_hardirqs_on_prepare+0x400/0x400
          ? max_active_store+0xf0/0xf0
          ? assign_work+0x168/0x240
          worker_thread+0x70f/0x12d0
          ? __kthread_parkme+0xd1/0x1d0
          ? process_one_work+0x1220/0x1220
          kthread+0x2d9/0x3b0
          ? kthread_complete_and_exit+0x20/0x20
          ret_from_fork+0x2d/0x70
          ? kthread_complete_and_exit+0x20/0x20
          ret_from_fork_asm+0x11/0x20
          </TASK>
          Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse
          ---[ end trace 0000000000000000 ]---
      
      Fixes: 3178308a
      
       ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-11-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e3f3ba9
    • Rahul Rameshbabu's avatar
      net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe · 64f14d16
      Rahul Rameshbabu authored
      When SQ is a port timestamping SQ for PTP, do not access tx flags of skb
      after free-ing the skb. Free the skb only after all references that depend
      on it have been handled in the dropped WQE path.
      
      Fixes: 3178308a
      
       ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-10-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      64f14d16
    • Jianbo Liu's avatar
      net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload · bdf788cf
      Jianbo Liu authored
      As IPSec packet offload in switchdev mode is not supported with LAG,
      it's unnecessary to modify those sent-to-vport rules to the peer eswitch.
      
      Fixes: c6c2bf5d
      
       ("net/mlx5e: Support IPsec packet offload for TX in switchdev mode")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-9-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bdf788cf
    • Vlad Buslov's avatar
      net/mlx5e: Fix pedit endianness · 0c101a23
      Vlad Buslov authored
      Referenced commit addressed endianness issue in mlx5 pedit implementation
      in ad hoc manner instead of systematically treating integer values
      according to their types which left pedit fields of sizes not equal to 4
      and where the bytes being modified are not least significant ones broken on
      big endian machines since wrong bits will be consumed during parsing which
      leads to following example error when applying pedit to source and
      destination MAC addresses:
      
      [Wed Oct 18 12:52:42 2023] mlx5_core 0001:00:00.1 p1v3_r: attempt to offload an unsupported field (cmd 0)
      [Wed Oct 18 12:52:42 2023] mask: 00000000330c5b68: 00 00 00 00 ff ff 00 00 00 00 ff ff 00 00 00 00  ................
      [Wed Oct 18 12:52:42 2023] mask: 0000000017d22fd9: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [Wed Oct 18 12:52:42 2023] mask: 000000008186d717: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [Wed Oct 18 12:52:42 2023] mask: 0000000029eb6149: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [Wed Oct 18 12:52:42 2023] mask: 000000007ed103e4: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [Wed Oct 18 12:52:42 2023] mask: 00000000db8101a6: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [Wed Oct 18 12:52:42 2023] mask: 00000000ec3c08a9: 00 00 00 00 00 00 00 00 00 00 00 00              ............
      
      Treat masks and values of pedit and filter match as network byte order,
      refactor pointers to them to void pointers instead of confusing u32
      pointers and only cast to pointer-to-integer when reading a value from
      them. Treat pedit mlx5_fields->field_mask as host byte order according to
      its type u32, change the constants in fields array accordingly.
      
      Fixes: 82198d8b
      
       ("net/mlx5e: Fix endianness when calculating pedit mask first bit")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-8-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c101a23
    • Gavin Li's avatar
      net/mlx5e: fix double free of encap_header in update funcs · 3a4aa3cb
      Gavin Li authored
      Follow up to the previous patch to fix the same issue for
      mlx5e_tc_tun_update_header_ipv4{6} when mlx5_packet_reformat_alloc()
      fails.
      
      When mlx5_packet_reformat_alloc() fails, the encap_header allocated in
      mlx5e_tc_tun_update_header_ipv4{6} will be released within it. However,
      e->encap_header is already set to the previously freed encap_header
      before mlx5_packet_reformat_alloc(). As a result, the later
      mlx5e_encap_put() will free e->encap_header again, causing a double free
      issue.
      
      mlx5e_encap_put()
           --> mlx5e_encap_dealloc()
               --> kfree(e->encap_header)
      
      This patch fix it by not setting e->encap_header until
      mlx5_packet_reformat_alloc() success.
      
      Fixes: a54e20b4
      
       ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
      Signed-off-by: default avatarGavin Li <gavinl@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-7-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3a4aa3cb
    • Dust Li's avatar
      net/mlx5e: fix double free of encap_header · 6f9b1a07
      Dust Li authored
      When mlx5_packet_reformat_alloc() fails, the encap_header allocated in
      mlx5e_tc_tun_create_header_ipv4{6} will be released within it. However,
      e->encap_header is already set to the previously freed encap_header
      before mlx5_packet_reformat_alloc(). As a result, the later
      mlx5e_encap_put() will free e->encap_header again, causing a double free
      issue.
      
      mlx5e_encap_put()
          --> mlx5e_encap_dealloc()
              --> kfree(e->encap_header)
      
      This happens when cmd: MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT fail.
      
      This patch fix it by not setting e->encap_header until
      mlx5_packet_reformat_alloc() success.
      
      Fixes: d589e785
      
       ("net/mlx5e: Allow concurrent creation of encap entries")
      Reported-by: default avatarCruz Zhao <cruzzhao@linux.alibaba.com>
      Reported-by: default avatarTianchen Ding <dtcccc@linux.alibaba.com>
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6f9b1a07
    • Rahul Rameshbabu's avatar
      net/mlx5: Decouple PHC .adjtime and .adjphase implementations · fd64fd13
      Rahul Rameshbabu authored
      When running a phase adjustment operation, the free running clock should
      not be modified at all. The phase control keyword is intended to trigger an
      internal servo on the device that will converge to the provided delta. A
      free running counter cannot implement phase adjustment.
      
      Fixes: 8e11a68e
      
       ("net/mlx5: Add adjphase function to support hardware-only offset control")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-5-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd64fd13
    • Erez Shitrit's avatar
      net/mlx5: DR, Allow old devices to use multi destination FTE · ad4d82c3
      Erez Shitrit authored
      The current check isn't aware of old devices that don't have the
      relevant FW capability. This patch allows multi destination FTE
      in old cards, as it was before this check.
      
      Fixes: f6f46e71
      
       ("net/mlx5: DR, Add check for multi destination FTE")
      Signed-off-by: default avatarErez Shitrit <erezsh@nvidia.com>
      Reviewed-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-4-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ad4d82c3
    • Maher Sanalla's avatar
      net/mlx5: Free used cpus mask when an IRQ is released · 7d2f74d1
      Maher Sanalla authored
      Each EQ table maintains a cpumask of the already used CPUs that are mapped
      to IRQs to ensure that each IRQ gets mapped to a unique CPU.
      
      However, on IRQ release, the said cpumask is not updated by clearing the
      CPU from the mask to allow future IRQ request, causing the following
      error when a SF is reloaded after it has utilized all CPUs for its IRQs:
      
      mlx5_irq_affinity_request:135:(pid 306010): Didn't find a matching IRQ.
      err = -28
      
      Thus, when releasing an IRQ, clear its mapped CPU from the used CPUs
      mask, to prevent the case described above.
      
      While at it, move the used cpumask update to the EQ layer as it is more
      fitting and preserves symmetricity of the IRQ request/release API.
      
      Fixes: a1772de7
      
       ("net/mlx5: Refactor completion IRQ request/release API")
      Signed-off-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-3-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d2f74d1
    • Itamar Gozlan's avatar
      Revert "net/mlx5: DR, Supporting inline WQE when possible" · df3aafe5
      Itamar Gozlan authored
      This reverts commit 95c337cc.
      The revert is required due to the suspicion it cause some tests
      fail and will be moved to further investigation.
      
      Fixes: 95c337cc
      
       ("net/mlx5: DR, Supporting inline WQE when possible")
      Signed-off-by: default avatarItamar Gozlan <igozlan@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-2-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      df3aafe5
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · a6a6a0a9
      Jakub Kicinski authored
      
      
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2023-11-15
      
      We've added 7 non-merge commits during the last 6 day(s) which contain
      a total of 9 files changed, 200 insertions(+), 49 deletions(-).
      
      The main changes are:
      
      1) Do not allocate bpf specific percpu memory unconditionally, from Yonghong.
      
      2) Fix precision backtracking instruction iteration, from Andrii.
      
      3) Fix control flow graph checking, from Andrii.
      
      4) Fix xskxceiver selftest build, from Anders.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Do not allocate percpu memory at init stage
        selftests/bpf: add more test cases for check_cfg()
        bpf: fix control-flow graph checking in privileged mode
        selftests/bpf: add edge case backtracking logic test
        bpf: fix precision backtracking instruction iteration
        bpf: handle ldimm64 properly in check_cfg()
        selftests: bpf: xskxceiver: ksft_print_msg: fix format type error
      ====================
      
      Link: https://lore.kernel.org/r/20231115214949.48854-1-alexei.starovoitov@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6a6a0a9
  3. Nov 15, 2023
    • Yonghong Song's avatar
      bpf: Do not allocate percpu memory at init stage · 1fda5bb6
      Yonghong Song authored
      Kirill Shutemov reported significant percpu memory consumption increase after
      booting in 288-cpu VM ([1]) due to commit 41a5db8d ("bpf: Add support for
      non-fix-size percpu mem allocation"). The percpu memory consumption is
      increased from 111MB to 969MB. The number is from /proc/meminfo.
      
      I tried to reproduce the issue with my local VM which at most supports upto
      255 cpus. With 252 cpus, without the above commit, the percpu memory
      consumption immediately after boot is 57MB while with the above commit the
      percpu memory consumption is 231MB.
      
      This is not good since so far percpu memory from bpf memory allocator is not
      widely used yet. Let us change pre-allocation in init stage to on-demand
      allocation when verifier detects there is a need of percpu memory for bpf
      program. With this change, percpu memory consumption after boot can be reduced
      signicantly.
      
        [1] https://lore.kernel.org/lkml/20231109154934.4saimljtqx625l3v@box.shutemov.name/
      
      Fixes: 41a5db8d
      
       ("bpf: Add support for non-fix-size percpu mem allocation")
      Reported-and-tested-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Acked-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20231111013928.948838-1-yonghong.song@linux.dev
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1fda5bb6
    • Gal Pressman's avatar
      net: Fix undefined behavior in netdev name allocation · 674e3180
      Gal Pressman authored
      Cited commit removed the strscpy() call and kept the snprintf() only.
      
      It is common to use 'dev->name' as the format string before a netdev is
      registered, this results in 'res' and 'name' pointers being equal.
      According to POSIX, if copying takes place between objects that overlap
      as a result of a call to sprintf() or snprintf(), the results are
      undefined.
      
      Add back the strscpy() and use 'buf' as an intermediate buffer.
      
      Fixes: 7ad17b04
      
       ("net: trust the bitmap in __dev_alloc_name()")
      Cc: Jakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      674e3180
    • Niklas Söderlund's avatar
      dt-bindings: net: ethernet-controller: Fix formatting error · efc0c836
      Niklas Söderlund authored
      
      
      When moving the *-internal-delay-ps properties to only apply for RGMII
      interface modes there where a typo in the text formatting.
      
      Signed-off-by: default avatarNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efc0c836
    • Johnathan Mantey's avatar
      Revert ncsi: Propagate carrier gain/loss events to the NCSI controller · 9e2e7efb
      Johnathan Mantey authored
      This reverts commit 3780bb29.
      
      The cited commit introduced unwanted behavior.
      
      The intent for the commit was to be able to detect carrier loss/gain
      for just the NIC connected to the BMC. The unwanted effect is a
      carrier loss for auxiliary paths also causes the BMC to lose
      carrier. The BMC never regains carrier despite the secondary NIC
      regaining a link.
      
      This change, when merged, needs to be backported to stable kernels.
      5.4-stable, 5.10-stable, 5.15-stable, 6.1-stable, 6.5-stable
      
      Fixes: 3780bb29
      
       ("ncsi: Propagate carrier gain/loss events to the NCSI controller")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJohnathan Mantey <johnathanx.mantey@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e2e7efb
    • Juergen Gross's avatar
      xen/events: remove some info_for_irq() calls in pirq handling · cee96422
      Juergen Gross authored
      
      
      Instead of the IRQ number user the struct irq_info pointer as parameter
      in the internal pirq related functions. This allows to drop some calls
      of info_for_irq().
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarOleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      cee96422
    • Juergen Gross's avatar
      xen/events: modify internal [un]bind interfaces · 3fcdaf3d
      Juergen Gross authored
      
      
      Modify the internal bind- and unbind-interfaces to take a struct
      irq_info parameter. When allocating a new IRQ pass the pointer from
      the allocating function further up.
      
      This will reduce the number of info_for_irq() calls and make the code
      more efficient.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarOleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      3fcdaf3d
    • Juergen Gross's avatar
      xen/events: drop xen_allocate_irqs_dynamic() · 5dd9ad32
      Juergen Gross authored
      
      
      Instead of having a common function for allocating a single IRQ or a
      consecutive number of IRQs, split up the functionality into the callers
      of xen_allocate_irqs_dynamic().
      
      This allows to handle any allocation error in xen_irq_init() gracefully
      instead of panicing the system. Let xen_irq_init() return the irq_info
      pointer or NULL in case of an allocation error.
      
      Additionally set the IRQ into irq_info already at allocation time, as
      otherwise the IRQ would be '0' (which is a valid IRQ number) until
      being set.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarOleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      5dd9ad32
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · c42d9eee
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - stackleak: add declarations for global functions (Arnd Bergmann)
      
       - gcc-plugins: randstruct: Only warn about true flexible arrays (Kees
         Cook)
      
       - gcc-plugins: latent_entropy: Fix description typo (Konstantin Runov)
      
      * tag 'hardening-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        gcc-plugins: latent_entropy: Fix typo (args -> argc) in plugin description
        gcc-plugins: randstruct: Only warn about true flexible arrays
        stackleak: add declarations for global functions
      c42d9eee
    • Linus Torvalds's avatar
      Merge tag 'zstd-linus-v6.7-rc2' of https://github.com/terrelln/linux · 86d11b0e
      Linus Torvalds authored
      Pull Zstd fix from Nick Terrell:
       "Only a single line change to fix a benign UBSAN warning"
      
      * tag 'zstd-linus-v6.7-rc2' of https://github.com/terrelln/linux:
        zstd: Fix array-index-out-of-bounds UBSAN warning
      86d11b0e
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-misc-fixes-for-v6-7' · a133eae8
      Jakub Kicinski authored
      
      
      Matthieu Baerts says:
      
      ====================
      mptcp: misc. fixes for v6.7
      
      Here are a few fixes related to MPTCP:
      
      - Patch 1 limits GSO max size to ~64K when MPTCP is being used due to a
        spec limit. 'gso_max_size' can exceed the max value supported by MPTCP
        since v5.19.
      
      - Patch 2 fixes a possible NULL pointer dereference on close that can
        happen since v6.7-rc1.
      
      - Patch 3 avoids sending a RM_ADDR when the corresponding address is no
        longer tracked locally. A regression for a fix backported to v5.19.
      
      - Patch 4 adds a missing lock when changing the IP TOS with setsockopt().
        A fix for v5.17.
      
      - Patch 5 fixes an expectation when running MPTCP Join selftest with the
        checksum option (-C). An issue present since v6.1.
      ====================
      
      Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-0-7b9cd6a7b7f4@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a133eae8
    • Paolo Abeni's avatar
      selftests: mptcp: fix fastclose with csum failure · 7cefbe5e
      Paolo Abeni authored
      
      
      Running the mp_join selftest manually with the following command line:
      
        ./mptcp_join.sh -z -C
      
      leads to some failures:
      
        002 fastclose server test
        # ...
        rtx                                 [fail] got 1 MP_RST[s] TX expected 0
        # ...
        rstrx                               [fail] got 1 MP_RST[s] RX expected 0
      
      The problem is really in the wrong expectations for the RST checks
      implied by the csum validation. Note that the same check is repeated
      explicitly in the same test-case, with the correct expectation and
      pass successfully.
      
      Address the issue explicitly setting the correct expectation for
      the failing checks.
      
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Fixes: 6bf41020
      
       ("selftests: mptcp: update and extend fastclose test-cases")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-5-7b9cd6a7b7f4@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7cefbe5e
    • Paolo Abeni's avatar
      mptcp: fix setsockopt(IP_TOS) subflow locking · 7679d34f
      Paolo Abeni authored
      The MPTCP implementation of the IP_TOS socket option uses the lockless
      variant of the TOS manipulation helper and does not hold such lock at
      the helper invocation time.
      
      Add the required locking.
      
      Fixes: ffcacff8
      
       ("mptcp: Support for IP_TOS for MPTCP setsockopt()")
      Cc: stable@vger.kernel.org
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/457
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-4-7b9cd6a7b7f4@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7679d34f
    • Geliang Tang's avatar
      mptcp: add validity check for sending RM_ADDR · 8df220b2
      Geliang Tang authored
      This patch adds the validity check for sending RM_ADDRs for userspace PM
      in mptcp_pm_remove_addrs(), only send a RM_ADDR when the address is in the
      anno_list or conn_list.
      
      Fixes: 8b1c94da
      
       ("mptcp: only send RM_ADDR in nl_cmd_remove")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-3-7b9cd6a7b7f4@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8df220b2
    • Paolo Abeni's avatar
      mptcp: fix possible NULL pointer dereference on close · d109a776
      Paolo Abeni authored
      After the blamed commit below, the MPTCP release callback can
      dereference the first subflow pointer via __mptcp_set_connected()
      and send buffer auto-tuning. Such pointer is always expected to be
      valid, except at socket destruction time, when the first subflow is
      deleted and the pointer zeroed.
      
      If the connect event is handled by the release callback while the
      msk socket is finally released, MPTCP hits the following splat:
      
        general protection fault, probably for non-canonical address 0xdffffc00000000f2: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000790-0x0000000000000797]
        CPU: 1 PID: 26719 Comm: syz-executor.2 Not tainted 6.6.0-syzkaller-10102-gff269e2cd5ad #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
        RIP: 0010:mptcp_subflow_ctx net/mptcp/protocol.h:542 [inline]
        RIP: 0010:__mptcp_propagate_sndbuf net/mptcp/protocol.h:813 [inline]
        RIP: 0010:__mptcp_set_connected+0x57/0x3e0 net/mptcp/subflow.c:424
        RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8a62323c
        RDX: 00000000000000f2 RSI: ffffffff8a630116 RDI: 0000000000000790
        RBP: ffff88803334b100 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000001 R11: 0000000000000034 R12: ffff88803334b198
        R13: ffff888054f0b018 R14: 0000000000000000 R15: ffff88803334b100
        FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fbcb4f75198 CR3: 000000006afb5000 CR4: 00000000003506f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <TASK>
         mptcp_release_cb+0xa2c/0xc40 net/mptcp/protocol.c:3405
         release_sock+0xba/0x1f0 net/core/sock.c:3537
         mptcp_close+0x32/0xf0 net/mptcp/protocol.c:3084
         inet_release+0x132/0x270 net/ipv4/af_inet.c:433
         inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:485
         __sock_release+0xae/0x260 net/socket.c:659
         sock_close+0x1c/0x20 net/socket.c:1419
         __fput+0x270/0xbb0 fs/file_table.c:394
         task_work_run+0x14d/0x240 kernel/task_work.c:180
         exit_task_work include/linux/task_work.h:38 [inline]
         do_exit+0xa92/0x2a20 kernel/exit.c:876
         do_group_exit+0xd4/0x2a0 kernel/exit.c:1026
         get_signal+0x23ba/0x2790 kernel/signal.c:2900
         arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309
         exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
         exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204
         __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
         syscall_exit_to_user_mode+0x1d/0x60 kernel/entry/common.c:296
         do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:88
         entry_SYSCALL_64_after_hwframe+0x63/0x6b
        RIP: 0033:0x7fb515e7cae9
        Code: Unable to access opcode bytes at 0x7fb515e7cabf.
        RSP: 002b:00007fb516c560c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
        RAX: 000000000000003c RBX: 00007fb515f9c120 RCX: 00007fb515e7cae9
        RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000006
        RBP: 00007fb515ec847a R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
        R13: 000000000000006e R14: 00007fb515f9c120 R15: 00007ffc631eb968
         </TASK>
      
      To avoid sparkling unneeded conditionals, address the issue explicitly
      checking msk->first only in the critical place.
      
      Fixes: 8005184f
      
       ("mptcp: refactor sndbuf auto-tuning")
      Cc: stable@vger.kernel.org
      Reported-by: default avatar <syzbot+9dfbaedb6e6baca57a32@syzkaller.appspotmail.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/454
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Closes: https://lore.kernel.org/netdev/CANn89iLZUA6S2a=K8GObnS62KK6Jt4B7PsAs7meMFooM8xaTgw@mail.gmail.com/
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20231114-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-v1-2-7b9cd6a7b7f4@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d109a776