Skip to content
  1. Mar 15, 2024
    • Florian Kauer's avatar
      igc: avoid returning frame twice in XDP_REDIRECT · 8df393af
      Florian Kauer authored
      [ Upstream commit ef27f655 ]
      
      When a frame can not be transmitted in XDP_REDIRECT
      (e.g. due to a full queue), it is necessary to free
      it by calling xdp_return_frame_rx_napi.
      
      However, this is the responsibility of the caller of
      the ndo_xdp_xmit (see for example bq_xmit_all in
      kernel/bpf/devmap.c) and thus calling it inside
      igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
      driver) as well will lead to memory corruption.
      
      In fact, bq_xmit_all expects that it can return all
      frames after the last successfully transmitted one.
      Therefore, break for the first not transmitted frame,
      but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
      This is equally implemented in other Intel drivers
      such as the igb.
      
      There are two alternatives to this that were rejected:
      1. Return num_frames as all the frames would have been
         transmitted and release them inside igc_xdp_xmit.
         While it might work technically, it is not what
         the return value is meant to represent (i.e. the
         number of SUCCESSFULLY transmitted packets).
      2. Rework kernel/bpf/devmap.c and all drivers to
         support non-consecutively dropped packets.
         Besides being complex, it likely has a negative
         performance impact without a significant gain
         since it is anyway unlikely that the next frame
         can be transmitted if the previous one was dropped.
      
      The memory corruption can be reproduced with
      the following script which leads to a kernel panic
      after a few seconds.  It basically generates more
      traffic than a i225 NIC can transmit and pushes it
      via XDP_REDIRECT from a virtual interface to the
      physical interface where frames get dropped.
      
         #!/bin/bash
         INTERFACE=enp4s0
         INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`
      
         sudo ip link add dev veth1 type veth peer name veth2
         sudo ip link set up $INTERFACE
         sudo ip link set up veth1
         sudo ip link set up veth2
      
         cat << EOF > redirect.bpf.c
      
         SEC("prog")
         int redirect(struct xdp_md *ctx)
         {
             return bpf_redirect($INTERFACE_IDX, 0);
         }
      
         char _license[] SEC("license") = "GPL";
         EOF
         clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
         sudo ip link set veth2 xdp obj redirect.bpf.o
      
         cat << EOF > pass.bpf.c
      
         SEC("prog")
         int pass(struct xdp_md *ctx)
         {
             return XDP_PASS;
         }
      
         char _license[] SEC("license") = "GPL";
         EOF
         clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
         sudo ip link set $INTERFACE xdp obj pass.bpf.o
      
         cat << EOF > trafgen.cfg
      
         {
           /* Ethernet Header */
           0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
           0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
           const16(ETH_P_IP),
      
           /* IPv4 Header */
           0b01000101, 0,   # IPv4 version, IHL, TOS
           const16(1028),   # IPv4 total length (UDP length + 20 bytes (IP header))
           const16(2),      # IPv4 ident
           0b01000000, 0,   # IPv4 flags, fragmentation off
           64,              # IPv4 TTL
           17,              # Protocol UDP
           csumip(14, 33),  # IPv4 checksum
      
           /* UDP Header */
           10,  0, 1, 1,    # IP Src - adapt as needed
           10,  0, 1, 2,    # IP Dest - adapt as needed
           const16(6666),   # UDP Src Port
           const16(6666),   # UDP Dest Port
           const16(1008),   # UDP length (UDP header 8 bytes + payload length)
           csumudp(14, 34), # UDP checksum
      
           /* Payload */
           fill('W', 1000),
         }
         EOF
      
         sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp
      
      Fixes: 4ff32036
      
       ("igc: Add support for XDP_REDIRECT action")
      Signed-off-by: default avatarFlorian Kauer <florian.kauer@linutronix.de>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8df393af
    • Rand Deeb's avatar
      net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink() · 1a770927
      Rand Deeb authored
      [ Upstream commit 06e456a0 ]
      
      The function ice_bridge_setlink() may encounter a NULL pointer dereference
      if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently
      in nla_for_each_nested(). To address this issue, add a check to ensure that
      br_spec is not NULL before proceeding with the nested attribute iteration.
      
      Fixes: b1edc14a
      
       ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
      Signed-off-by: default avatarRand Deeb <rand.sec96@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1a770927
    • Jacob Keller's avatar
      ice: virtchnl: stop pretending to support RSS over AQ or registers · 671a2860
      Jacob Keller authored
      [ Upstream commit 2652b99e ]
      
      The E800 series hardware uses the same iAVF driver as older devices,
      including the virtchnl negotiation scheme.
      
      This negotiation scheme includes a mechanism to determine what type of RSS
      should be supported, including RSS over PF virtchnl messages, RSS over
      firmware AdminQ messages, and RSS via direct register access.
      
      The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its
      supported by the VF driver. However, if an older VF driver is loaded, it
      may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ.
      
      The ice driver happily agrees to support these methods. Unfortunately, the
      underlying hardware does not support these mechanisms. The E800 series VFs
      don't have the appropriate registers for RSS_REG. The mailbox queue used by
      VFs for VF to PF communication blocks messages which do not have the
      VF-to-PF opcode.
      
      Stop lying to the VF that it could support RSS over AdminQ or registers, as
      these interfaces do not work when the hardware is operating on an E800
      series device.
      
      In practice this is unlikely to be hit by any normal user. The iAVF driver
      has supported RSS over PF virtchnl commands since 2016, and always defaults
      to using RSS_PF if possible.
      
      In principle, nothing actually stops the existing VF from attempting to
      access the registers or send an AQ command. However a properly coded VF
      will check the capability flags and will report a more useful error if it
      detects a case where the driver does not support the RSS offloads that it
      does.
      
      Fixes: 1071a835
      
       ("ice: Implement virtchnl commands for AVF support")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      671a2860
    • Horatiu Vultur's avatar
      net: sparx5: Fix use after free inside sparx5_del_mact_entry · e83bebb7
      Horatiu Vultur authored
      [ Upstream commit 89d72d41 ]
      
      Based on the static analyzis of the code it looks like when an entry
      from the MAC table was removed, the entry was still used after being
      freed. More precise the vid of the mac_entry was used after calling
      devm_kfree on the mac_entry.
      The fix consists in first using the vid of the mac_entry to delete the
      entry from the HW and after that to free it.
      
      Fixes: b37a1bae
      
       ("net: sparx5: add mactable support")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240301080608.3053468-1-horatiu.vultur@microchip.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e83bebb7
    • Eric Dumazet's avatar
      geneve: make sure to pull inner header in geneve_rx() · 0ece581d
      Eric Dumazet authored
      [ Upstream commit 1ca1ba46 ]
      
      syzbot triggered a bug in geneve_rx() [1]
      
      Issue is similar to the one I fixed in commit 8d975c15
      ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()")
      
      We have to save skb->network_header in a temporary variable
      in order to be able to recompute the network_header pointer
      after a pskb_inet_may_pull() call.
      
      pskb_inet_may_pull() makes sure the needed headers are in skb->head.
      
      [1]
      BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
       BUG: KMSAN: uninit-value in geneve_rx drivers/net/geneve.c:279 [inline]
       BUG: KMSAN: uninit-value in geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391
        IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
        geneve_rx drivers/net/geneve.c:279 [inline]
        geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391
        udp_queue_rcv_one_skb+0x1d39/0x1f20 net/ipv4/udp.c:2108
        udp_queue_rcv_skb+0x6ae/0x6e0 net/ipv4/udp.c:2186
        udp_unicast_rcv_skb+0x184/0x4b0 net/ipv4/udp.c:2346
        __udp4_lib_rcv+0x1c6b/0x3010 net/ipv4/udp.c:2422
        udp_rcv+0x7d/0xa0 net/ipv4/udp.c:2604
        ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205
        ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
        dst_input include/net/dst.h:461 [inline]
        ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569
        __netif_receive_skb_one_core net/core/dev.c:5534 [inline]
        __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648
        process_backlog+0x480/0x8b0 net/core/dev.c:5976
        __napi_poll+0xe3/0x980 net/core/dev.c:6576
        napi_poll net/core/dev.c:6645 [inline]
        net_rx_action+0x8b8/0x1870 net/core/dev.c:6778
        __do_softirq+0x1b7/0x7c5 kernel/softirq.c:553
        do_softirq+0x9a/0xf0 kernel/softirq.c:454
        __local_bh_enable_ip+0x9b/0xa0 kernel/softirq.c:381
        local_bh_enable include/linux/bottom_half.h:33 [inline]
        rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
        __dev_queue_xmit+0x2768/0x51c0 net/core/dev.c:4378
        dev_queue_xmit include/linux/netdevice.h:3171 [inline]
        packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
        packet_snd net/packet/af_packet.c:3081 [inline]
        packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        __sys_sendto+0x735/0xa10 net/socket.c:2191
        __do_sys_sendto net/socket.c:2203 [inline]
        __se_sys_sendto net/socket.c:2199 [inline]
        __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was created at:
        slab_post_alloc_hook mm/slub.c:3819 [inline]
        slab_alloc_node mm/slub.c:3860 [inline]
        kmem_cache_alloc_node+0x5cb/0xbc0 mm/slub.c:3903
        kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
        __alloc_skb+0x352/0x790 net/core/skbuff.c:651
        alloc_skb include/linux/skbuff.h:1296 [inline]
        alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6394
        sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2783
        packet_alloc_skb net/packet/af_packet.c:2930 [inline]
        packet_snd net/packet/af_packet.c:3024 [inline]
        packet_sendmsg+0x70c2/0x9f10 net/packet/af_packet.c:3113
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        __sys_sendto+0x735/0xa10 net/socket.c:2191
        __do_sys_sendto net/socket.c:2203 [inline]
        __se_sys_sendto net/socket.c:2199 [inline]
        __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Fixes: 2d07dc79
      
       ("geneve: add initial netdev driver for GENEVE tunnels")
      Reported-and-tested-by: default avatar <syzbot+6a1423ff3f97159aae64@syzkaller.appspotmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ece581d
    • Steven Rostedt (Google)'s avatar
      tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string · 24d5a896
      Steven Rostedt (Google) authored
      [ Upstream commit 51270d57 ]
      
      I'm updating __assign_str() and will be removing the second parameter. To
      make sure that it does not break anything, I make sure that it matches the
      __string() field, as that is where the string is actually going to be
      saved in. To make sure there's nothing that breaks, I added a WARN_ON() to
      make sure that what was used in __string() is the same that is used in
      __assign_str().
      
      In doing this change, an error was triggered as __assign_str() now expects
      the string passed in to be a char * value. I instead had the following
      warning:
      
      include/trace/events/qdisc.h: In function ‘trace_event_raw_event_qdisc_reset’:
      include/trace/events/qdisc.h:91:35: error: passing argument 1 of 'strcmp' from incompatible pointer type [-Werror=incompatible-pointer-types]
         91 |                 __assign_str(dev, qdisc_dev(q));
      
      That's because the qdisc_enqueue() and qdisc_reset() pass in qdisc_dev(q)
      to __assign_str() and to __string(). But that function returns a pointer
      to struct net_device and not a string.
      
      It appears that these events are just saving the pointer as a string and
      then reading it as a string as well.
      
      Use qdisc_dev(q)->name to save the device instead.
      
      Fixes: a34dac0b
      
       ("net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      24d5a896
    • Rahul Rameshbabu's avatar
      net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context · d98d364d
      Rahul Rameshbabu authored
      [ Upstream commit 90502d43 ]
      
      The NAPI poll context is a softirq context. Do not use normal spinlock API
      in this context to prevent concurrency issues.
      
      Fixes: 3178308a
      
       ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      CC: Vadim Fedorenko <vadfed@meta.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d98d364d
    • Rahul Rameshbabu's avatar
      net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking... · d1f71615
      Rahul Rameshbabu authored
      net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map
      
      [ Upstream commit b7cf0758 ]
      
      Just simply reordering the functions mlx5e_ptp_metadata_map_put and
      mlx5e_ptpsq_track_metadata in the mlx5e_txwqe_complete context is not good
      enough since both the compiler and CPU are free to reorder these two
      functions. If reordering does occur, the issue that was supposedly fixed by
      7e3f3ba9 ("net/mlx5e: Track xmit submission to PTP WQ after populating
      metadata map") will be seen. This will lead to NULL pointer dereferences in
      mlx5e_ptpsq_mark_ts_cqes_undelivered in the NAPI polling context due to the
      tracking list being populated before the metadata map.
      
      Fixes: 7e3f3ba9
      
       ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      CC: Vadim Fedorenko <vadfed@meta.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d1f71615
    • Emeel Hakim's avatar
      net/mlx5e: Fix MACsec state loss upon state update in offload path · b526c317
      Emeel Hakim authored
      [ Upstream commit a71f2147 ]
      
      The packet number attribute of the SA is incremented by the device rather
      than the software stack when enabling hardware offload. Because the packet
      number attribute is managed by the hardware, the software has no insight
      into the value of the packet number attribute actually written by the
      device.
      
      Previously when MACsec offload was enabled, the hardware object for
      handling the offload was destroyed when the SA was disabled. Re-enabling
      the SA would lead to a new hardware object being instantiated. This new
      hardware object would not have any recollection of the correct packet
      number for the SA. Instead, destroy the flow steering rule when
      deactivating the SA and recreate it upon reactivation, preserving the
      original hardware object.
      
      Fixes: 8ff0ac5b
      
       ("net/mlx5: Add MACsec offload Tx command support")
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b526c317
    • Jianbo Liu's avatar
      net/mlx5e: Change the warning when ignore_flow_level is not supported · 6d6bb522
      Jianbo Liu authored
      [ Upstream commit dd238b70 ]
      
      Downgrade the print from mlx5_core_warn() to mlx5_core_dbg(), as it
      is just a statement of fact that firmware doesn't support ignore flow
      level.
      
      And change the wording to "firmware flow level support is missing", to
      make it more accurate.
      
      Fixes: ae2ee3be
      
       ("net/mlx5: CT: Remove warning of ignore_flow_level support for VFs")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Suggested-by: default avatarElliott, Robert (Servers) <elliott@hpe.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d6bb522
    • Moshe Shemesh's avatar
      net/mlx5: Check capability for fw_reset · c11138f0
      Moshe Shemesh authored
      [ Upstream commit 5e6107b4 ]
      
      Functions which can't access MFRL (Management Firmware Reset Level)
      register, have no use of fw_reset structures or events. Remove fw_reset
      structures allocation and registration for fw reset events notifications
      for these functions.
      
      Having the devlink param enable_remote_dev_reset on functions that don't
      have this capability is misleading as these functions are not allowed to
      influence the reset flow. Hence, this patch removes this parameter for
      such functions.
      
      In addition, return not supported on devlink reload action fw_activate
      for these functions.
      
      Fixes: 38b9f903
      
       ("net/mlx5: Handle sync reset request event")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c11138f0
    • Jianbo Liu's avatar
      net/mlx5: E-switch, Change flow rule destination checking · c8d7228d
      Jianbo Liu authored
      [ Upstream commit 85ea2c5c ]
      
      The checking in the cited commit is not accurate. In the common case,
      VF destination is internal, and uplink destination is external.
      However, uplink destination with packet reformat is considered as
      internal because firmware uses LB+hairpin to support it. Update the
      checking so header rewrite rules with both internal and external
      destinations are not allowed.
      
      Fixes: e0e22d59
      
       ("net/mlx5: E-switch, Add checking for flow rule destinations")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c8d7228d
    • Saeed Mahameed's avatar
      Revert "net/mlx5e: Check the number of elements before walk TC rhashtable" · ba888f1f
      Saeed Mahameed authored
      [ Upstream commit b7bbd698 ]
      
      This reverts commit 4e25b661.
      
      This Commit was mistakenly applied by pulling the wrong tag, remove it.
      
      Fixes: 4e25b661
      
       ("net/mlx5e: Check the number of elements before walk TC rhashtable")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ba888f1f
    • Gavin Li's avatar
      Revert "net/mlx5: Block entering switchdev mode with ns inconsistency" · 3fba8eab
      Gavin Li authored
      [ Upstream commit 8deeefb2 ]
      
      This reverts commit 662404b2.
      The revert is required due to the suspicion it is not good for anything
      and cause crash.
      
      Fixes: 662404b2
      
       ("net/mlx5e: Block entering switchdev mode with ns inconsistency")
      Signed-off-by: default avatarGavin Li <gavinl@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3fba8eab
    • Maciej Fijalkowski's avatar
      ice: reorder disabling IRQ and NAPI in ice_qp_dis · 4c0b028e
      Maciej Fijalkowski authored
      [ Upstream commit 99099c6b ]
      
      ice_qp_dis() currently does things in very mixed way. Tx is stopped
      before disabling IRQ on related queue vector, then it takes care of
      disabling Rx and finally NAPI is disabled.
      
      Let us start with disabling IRQs in the first place followed by turning
      off NAPI. Then it is safe to handle queues.
      
      One subtle change on top of that is that even though ice_qp_ena() looks
      more sane, clear ICE_CFG_BUSY as the last thing there.
      
      Fixes: 2d4238f5
      
       ("ice: Add support for AF_XDP")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4c0b028e
    • Maciej Fijalkowski's avatar
      i40e: disable NAPI right after disabling irqs when handling xsk_pool · 484c8e3b
      Maciej Fijalkowski authored
      [ Upstream commit d562b11c ]
      
      Disable NAPI before shutting down queues that this particular NAPI
      contains so that the order of actions in i40e_queue_pair_disable()
      mirrors what we do in i40e_queue_pair_enable().
      
      Fixes: 123cecd4
      
       ("i40e: added queue pair disable/enable functions")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      484c8e3b
    • Maciej Fijalkowski's avatar
      ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able · 2e60e953
      Maciej Fijalkowski authored
      [ Upstream commit cbf996f5
      
       ]
      
      Currently routines that are supposed to toggle state of ring pair do not
      take care of associated interrupt with queue vector that these rings
      belong to. This causes funky issues such as dead interface due to irq
      misconfiguration, as per Pavel's report from Closes: tag.
      
      Add a function responsible for disabling single IRQ in EIMC register and
      call this as a very first thing when disabling ring pair during xsk_pool
      setup. For enable let's reuse ixgbe_irq_enable_queues(). Besides this,
      disable/enable NAPI as first/last thing when dealing with closing or
      opening ring pair that xsk_pool is being configured on.
      
      Reported-by: default avatarPavel Vazharov <pavel@x3me.net>
      Closes: https://lore.kernel.org/netdev/CAJEV1ijxNyPTwASJER1bcZzS9nMoZJqfR86nu_3jFFVXzZQ4NA@mail.gmail.com/
      Fixes: 024aa580
      
       ("ixgbe: added Rx/Tx ring disable/enable functions")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2e60e953
    • Oleksij Rempel's avatar
      net: lan78xx: fix runtime PM count underflow on link stop · 550fe716
      Oleksij Rempel authored
      [ Upstream commit 1eecc7ab ]
      
      Current driver has some asymmetry in the runtime PM calls. On lan78xx_open()
      it will call usb_autopm_get() and unconditionally usb_autopm_put(). And
      on lan78xx_stop() it will call only usb_autopm_put(). So far, it was
      working only because this driver do not activate autosuspend by default,
      so it was visible only by warning "Runtime PM usage count underflow!".
      
      Since, with current driver, we can't use runtime PM with active link,
      execute lan78xx_open()->usb_autopm_put() only in error case. Otherwise,
      keep ref counting high as long as interface is open.
      
      Fixes: 55d7de9d
      
       ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      550fe716
    • Leon Romanovsky's avatar
      xfrm: Pass UDP encapsulation in TX packet offload · f6edcad5
      Leon Romanovsky authored
      [ Upstream commit 983a73da ]
      
      In addition to citied commit in Fixes line, allow UDP encapsulation in
      TX path too.
      
      Fixes: 89edf402
      
       ("xfrm: Support UDP encapsulation in packet offload mode")
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      Reported-by: default avatarMike Yu <yumike@google.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f6edcad5
    • Byungchul Park's avatar
      mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index · d6159bd4
      Byungchul Park authored
      [ Upstream commit 2774f256 ]
      
      With numa balancing on, when a numa system is running where a numa node
      doesn't have its local memory so it has no managed zones, the following
      oops has been observed.  It's because wakeup_kswapd() is called with a
      wrong zone index, -1.  Fixed it by checking the index before calling
      wakeup_kswapd().
      
      > BUG: unable to handle page fault for address: 00000000000033f3
      > #PF: supervisor read access in kernel mode
      > #PF: error_code(0x0000) - not-present page
      > PGD 0 P4D 0
      > Oops: 0000 [#1] PREEMPT SMP NOPTI
      > CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255
      > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      >    rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      > RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812)
      > Code: (omitted)
      > RSP: 0000:ffffc90004257d58 EFLAGS: 00010286
      > RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003
      > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480
      > RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff
      > R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003
      > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940
      > FS:  00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0
      > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      > PKRU: 55555554
      > Call Trace:
      >  <TASK>
      > ? __die
      > ? page_fault_oops
      > ? __pte_offset_map_lock
      > ? exc_page_fault
      > ? asm_exc_page_fault
      > ? wakeup_kswapd
      > migrate_misplaced_page
      > __handle_mm_fault
      > handle_mm_fault
      > do_user_addr_fault
      > exc_page_fault
      > asm_exc_page_fault
      > RIP: 0033:0x55b897ba0808
      > Code: (omitted)
      > RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287
      > RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0
      > RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0
      > RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075
      > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
      > R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000
      >  </TASK>
      
      Link: https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com
      
      
      Signed-off-by: default avatarByungchul Park <byungchul@sk.com>
      Reported-by: default avatarHyeongtak Ji <hyeongtak.ji@sk.com>
      Fixes: c574bbe9
      
       ("NUMA balancing: optimize page placement for memory tiering system")
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d6159bd4
    • Xiubo Li's avatar
      ceph: switch to corrected encoding of max_xattr_size in mdsmap · 641eb2d9
      Xiubo Li authored
      [ Upstream commit 51d31149 ]
      
      The addition of bal_rank_mask with encoding version 17 was merged
      into ceph.git in Oct 2022 and made it into v18.2.0 release normally.
      A few months later, the much delayed addition of max_xattr_size got
      merged, also with encoding version 17, placed before bal_rank_mask
      in the encoding -- but it didn't make v18.2.0 release.
      
      The way this ended up being resolved on the MDS side is that
      bal_rank_mask will continue to be encoded in version 17 while
      max_xattr_size is now encoded in version 18.  This does mean that
      older kernels will misdecode version 17, but this is also true for
      v18.2.0 and v18.2.1 clients in userspace.
      
      The best we can do is backport this adjustment -- see ceph.git
      commit 78abfeaff27fee343fb664db633de5b221699a73 for details.
      
      [ idryomov: changelog ]
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/64440
      Fixes: d93231a6
      
       ("ceph: prevent a client from exceeding the MDS maximum xattr size")
      Signed-off-by: default avatarXiubo Li <xiubli@redhat.com>
      Reviewed-by: default avatarPatrick Donnelly <pdonnell@ibm.com>
      Reviewed-by: default avatarVenky Shankar <vshankar@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      641eb2d9
    • Frank Li's avatar
      dmaengine: fsl-edma: correct max_segment_size setting · 3b897ea5
      Frank Li authored
      [ Upstream commit a79f949a ]
      
      Correcting the previous setting of 0x3fff to the actual value of 0x7fff.
      
      Introduced new macro 'EDMA_TCD_ITER_MASK' for improved code clarity and
      utilization of FIELD_GET to obtain the accurate maximum value.
      
      Cc: stable@vger.kernel.org
      Fixes: e0674853
      
       ("dmaengine: fsl-edma: support edma memcpy")
      Signed-off-by: default avatarFrank Li <Frank.Li@nxp.com>
      Link: https://lore.kernel.org/r/20240207194733.2112870-1-Frank.Li@nxp.com
      
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3b897ea5
    • Frank Li's avatar
      dmaengine: fsl-edma: utilize common dt-binding header file · 525c1397
      Frank Li authored
      [ Upstream commit d0e217b7
      
       ]
      
      Refactor the code to use the common dt-binding header file, fsl-edma.h.
      Renaming ARGS* to FSL_EDMA*, ensuring no functional changes.
      
      Signed-off-by: default avatarFrank Li <Frank.Li@nxp.com>
      Link: https://lore.kernel.org/r/20231114154824.3617255-4-Frank.Li@nxp.com
      
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Stable-dep-of: a79f949a
      
       ("dmaengine: fsl-edma: correct max_segment_size setting")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      525c1397
    • Frank Li's avatar
      dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts · fb2f43ed
      Frank Li authored
      [ Upstream commit 1e9b0525
      
       ]
      
      Introduce a common dt-bindings header file, fsl-edma.h, shared between
      the driver and dts files. This addition aims to eliminate hardcoded values
      in dts files, promoting maintainability and consistency.
      
      DTS header file not support BIT() macro yet. Directly use 2^n number.
      
      Signed-off-by: default avatarFrank Li <Frank.Li@nxp.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20231114154824.3617255-3-Frank.Li@nxp.com
      
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Stable-dep-of: a79f949a
      
       ("dmaengine: fsl-edma: correct max_segment_size setting")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fb2f43ed
  2. Mar 06, 2024