Skip to content
  1. Apr 11, 2024
    • David Arinzon's avatar
      net: ena: Fix incorrect descriptor free behavior · bf02d9fe
      David Arinzon authored
      ENA has two types of TX queues:
      - queues which only process TX packets arriving from the network stack
      - queues which only process TX packets forwarded to it by XDP_REDIRECT
        or XDP_TX instructions
      
      The ena_free_tx_bufs() cycles through all descriptors in a TX queue
      and unmaps + frees every descriptor that hasn't been acknowledged yet
      by the device (uncompleted TX transactions).
      The function assumes that the processed TX queue is necessarily from
      the first category listed above and ends up using napi_consume_skb()
      for descriptors belonging to an XDP specific queue.
      
      This patch solves a bug in which, in case of a VF reset, the
      descriptors aren't freed correctly, leading to crashes.
      
      Fixes: 548c4940
      
       ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bf02d9fe
    • David Arinzon's avatar
      net: ena: Wrong missing IO completions check order · f7e41718
      David Arinzon authored
      Missing IO completions check is called every second (HZ jiffies).
      This commit fixes several issues with this check:
      
      1. Duplicate queues check:
         Max of 4 queues are scanned on each check due to monitor budget.
         Once reaching the budget, this check exits under the assumption that
         the next check will continue to scan the remainder of the queues,
         but in practice, next check will first scan the last already scanned
         queue which is not necessary and may cause the full queue scan to
         last a couple of seconds longer.
         The fix is to start every check with the next queue to scan.
         For example, on 8 IO queues:
         Bug: [0,1,2,3], [3,4,5,6], [6,7]
         Fix: [0,1,2,3], [4,5,6,7]
      
      2. Unbalanced queues check:
         In case the number of active IO queues is not a multiple of budget,
         there will be checks which don't utilize the full budget
         because the full scan exits when reaching the last queue id.
         The fix is to run every TX completion check with exact queue budget
         regardless of the queue id.
         For example, on 7 IO queues:
         Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
         Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
         The budget may be lowered in case the number of IO queues is less
         than the budget (4) to make sure there are no duplicate queues on
         the same check.
         For example, on 3 IO queues:
         Bug: [0,1,2,0], [1,2,0,1]
         Fix: [0,1,2], [0,1,2]
      
      Fixes: 1738cd3e
      
       ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
      Signed-off-by: default avatarAmit Bernstein <amitbern@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f7e41718
    • David Arinzon's avatar
      net: ena: Fix potential sign extension issue · 713a8519
      David Arinzon authored
      Small unsigned types are promoted to larger signed types in
      the case of multiplication, the result of which may overflow.
      In case the result of such a multiplication has its MSB
      turned on, it will be sign extended with '1's.
      This changes the multiplication result.
      
      Code example of the phenomenon:
      -------------------------------
      u16 x, y;
      size_t z1, z2;
      
      x = y = 0xffff;
      printk("x=%x y=%x\n",x,y);
      
      z1 = x*y;
      z2 = (size_t)x*y;
      
      printk("z1=%lx z2=%lx\n", z1, z2);
      
      Output:
      -------
      x=ffff y=ffff
      z1=fffffffffffe0001 z2=fffe0001
      
      The expected result of ffff*ffff is fffe0001, and without the
      explicit casting to avoid the unwanted sign extension we got
      fffffffffffe0001.
      
      This commit adds an explicit casting to avoid the sign extension
      issue.
      
      Fixes: 689b2bda
      
       ("net: ena: add functions for handling Low Latency Queues in ena_com")
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      713a8519
    • Paolo Abeni's avatar
      Merge tag 'for-net-2024-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · fe3eb406
      Paolo Abeni authored
      
      
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
        - L2CAP: Don't double set the HCI_CONN_MGMT_CONNECTED bit
        - Fix memory leak in hci_req_sync_complete
        - hci_sync: Fix using the same interval and window for Coded PHY
        - Fix not validating setsockopt user input
      
      * tag 'for-net-2024-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit
        Bluetooth: hci_sock: Fix not validating setsockopt user input
        Bluetooth: ISO: Fix not validating setsockopt user input
        Bluetooth: L2CAP: Fix not validating setsockopt user input
        Bluetooth: RFCOMM: Fix not validating setsockopt user input
        Bluetooth: SCO: Fix not validating setsockopt user input
        Bluetooth: Fix memory leak in hci_req_sync_complete()
        Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY
        Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset
      ====================
      
      Link: https://lore.kernel.org/r/20240410191610.4156653-1-luiz.dentz@gmail.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      fe3eb406
    • Michal Luczaj's avatar
      af_unix: Fix garbage collector racing against connect() · 47d8ac01
      Michal Luczaj authored
      Garbage collector does not take into account the risk of embryo getting
      enqueued during the garbage collection. If such embryo has a peer that
      carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
      different set of children. Leading to an incorrectly elevated inflight
      count, and then a dangling pointer within the gc_inflight_list.
      
      sockets are AF_UNIX/SOCK_STREAM
      S is an unconnected socket
      L is a listening in-flight socket bound to addr, not in fdtable
      V's fd will be passed via sendmsg(), gets inflight count bumped
      
      connect(S, addr)	sendmsg(S, [V]); close(V)	__unix_gc()
      ----------------	-------------------------	-----------
      
      NS = unix_create1()
      skb1 = sock_wmalloc(NS)
      L = unix_find_other(addr)
      unix_state_lock(L)
      unix_peer(S) = NS
      			// V count=1 inflight=0
      
       			NS = unix_peer(S)
       			skb2 = sock_alloc()
      			skb_queue_tail(NS, skb2[V])
      
      			// V became in-flight
      			// V count=2 inflight=1
      
      			close(V)
      
      			// V count=1 inflight=1
      			// GC candidate condition met
      
      						for u in gc_inflight_list:
      						  if (total_refs == inflight_refs)
      						    add u to gc_candidates
      
      						// gc_candidates={L, V}
      
      						for u in gc_candidates:
      						  scan_children(u, dec_inflight)
      
      						// embryo (skb1) was not
      						// reachable from L yet, so V's
      						// inflight remains unchanged
      __skb_queue_tail(L, skb1)
      unix_state_unlock(L)
      						for u in gc_candidates:
      						  if (u.inflight)
      						    scan_children(u, inc_inflight_move_tail)
      
      						// V count=1 inflight=2 (!)
      
      If there is a GC-candidate listening socket, lock/unlock its state. This
      makes GC wait until the end of any ongoing connect() to that socket. After
      flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
      there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
      this point, unix_inflight() can not happen because unix_gc_lock is already
      taken. Inflight graph remains unaffected.
      
      Fixes: 1fd05ba5
      
       ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      47d8ac01
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: trap link-local frames regardless of ST Port State · 17c56011
      Arınç ÜNAL authored
      In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer
      (DLL) of the Open Systems Interconnection basic reference model (OSI/RM)
      are described; the medium access control (MAC) and logical link control
      (LLC) sublayers. The MAC sublayer is the one facing the physical layer.
      
      In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A
      Bridge component comprises a MAC Relay Entity for interconnecting the Ports
      of the Bridge, at least two Ports, and higher layer entities with at least
      a Spanning Tree Protocol Entity included.
      
      Each Bridge Port also functions as an end station and shall provide the MAC
      Service to an LLC Entity. Each instance of the MAC Service is provided to a
      distinct LLC Entity that supports protocol identification, multiplexing,
      and demultiplexing, for protocol data unit (PDU) transmission and reception
      by one or more higher layer entities.
      
      It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC
      Entity associated with each Bridge Port is modeled as being directly
      connected to the attached Local Area Network (LAN).
      
      On the switch with CPU port architecture, CPU port functions as Management
      Port, and the Management Port functionality is provided by software which
      functions as an end station. Software is connected to an IEEE 802 LAN that
      is wholly contained within the system that incorporates the Bridge.
      Software provides access to the LLC Entity associated with each Bridge Port
      by the value of the source port field on the special tag on the frame
      received by software.
      
      We call frames that carry control information to determine the active
      topology and current extent of each Virtual Local Area Network (VLAN),
      i.e., spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN
      Registration Protocol Data Units (MVRPDUs), and frames from other link
      constrained protocols, such as Extensible Authentication Protocol over LAN
      (EAPOL) and Link Layer Discovery Protocol (LLDP), link-local frames. They
      are not forwarded by a Bridge. Permanently configured entries in the
      filtering database (FDB) ensure that such frames are discarded by the
      Forwarding Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in
      detail:
      
      Each of the reserved MAC addresses specified in Table 8-1
      (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be
      permanently configured in the FDB in C-VLAN components and ERs.
      
      Each of the reserved MAC addresses specified in Table 8-2
      (01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently
      configured in the FDB in S-VLAN components.
      
      Each of the reserved MAC addresses specified in Table 8-3
      (01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB
      in TPMR components.
      
      The FDB entries for reserved MAC addresses shall specify filtering for all
      Bridge Ports and all VIDs. Management shall not provide the capability to
      modify or remove entries for reserved MAC addresses.
      
      The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of
      propagation of PDUs within a Bridged Network, as follows:
      
        The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that
        no conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN)
        component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward.
        PDUs transmitted using this destination address, or any other addresses
        that appear in Table 8-1, Table 8-2, and Table 8-3
        (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can
        therefore travel no further than those stations that can be reached via a
        single individual LAN from the originating station.
      
        The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an
        address that no conformant S-VLAN component, C-VLAN component, or MAC
        Bridge can forward; however, this address is relayed by a TPMR component.
        PDUs using this destination address, or any of the other addresses that
        appear in both Table 8-1 and Table 8-2 but not in Table 8-3
        (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed
        by any TPMRs but will propagate no further than the nearest S-VLAN
        component, C-VLAN component, or MAC Bridge.
      
        The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an
        address that no conformant C-VLAN component, MAC Bridge can forward;
        however, it is relayed by TPMR components and S-VLAN components. PDUs
        using this destination address, or any of the other addresses that appear
        in Table 8-1 but not in either Table 8-2 or Table 8-3
        (01-80-C2-00-00-[00,0B,0C,0D,0F]), will be relayed by TPMR components and
        S-VLAN components but will propagate no further than the nearest C-VLAN
        component or MAC Bridge.
      
      Because the LLC Entity associated with each Bridge Port is provided via CPU
      port, we must not filter these frames but forward them to CPU port.
      
      In a Bridge, the transmission Port is majorly decided by ingress and egress
      rules, FDB, and spanning tree Port State functions of the Forwarding
      Process. For link-local frames, only CPU port should be designated as
      destination port in the FDB, and the other functions of the Forwarding
      Process must not interfere with the decision of the transmission Port. We
      call this process trapping frames to CPU port.
      
      Therefore, on the switch with CPU port architecture, link-local frames must
      be trapped to CPU port, and certain link-local frames received by a Port of
      a Bridge comprising a TPMR component or an S-VLAN component must be
      excluded from it.
      
      A Bridge of the switch with CPU port architecture cannot comprise a
      Two-Port MAC Relay (TPMR) component as a TPMR component supports only a
      subset of the functionality of a MAC Bridge. A Bridge comprising two Ports
      (Management Port doesn't count) of this architecture will either function
      as a standard MAC Bridge or a standard VLAN Bridge.
      
      Therefore, a Bridge of this architecture can only comprise S-VLAN
      components, C-VLAN components, or MAC Bridge components. Since there's no
      TPMR component, we don't need to relay PDUs using the destination addresses
      specified on the Nearest non-TPMR section, and the proportion of the
      Nearest Customer Bridge section where they must be relayed by TPMR
      components.
      
      One option to trap link-local frames to CPU port is to add static FDB
      entries with CPU port designated as destination port. However, because that
      Independent VLAN Learning (IVL) is being used on every VID, each entry only
      applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC
      Bridge component or a C-VLAN component, there would have to be 16 times
      4096 entries. This switch intellectual property can only hold a maximum of
      2048 entries. Using this option, there also isn't a mechanism to prevent
      link-local frames from being discarded when the spanning tree Port State of
      the reception Port is discarding.
      
      The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4
      registers. Whilst this applies to every VID, it doesn't contain all of the
      reserved MAC addresses without affecting the remaining Standard Group MAC
      Addresses. The REV_UN frame tag utilised using the RGAC4 register covers
      the remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination
      addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF
      destination addresses which may be relayed by MAC Bridges or VLAN Bridges.
      The latter option provides better but not complete conformance.
      
      This switch intellectual property also does not provide a mechanism to trap
      link-local frames with specific destination addresses to CPU port by
      Bridge, to conform to the filtering rules for the distinct Bridge
      components.
      
      Therefore, regardless of the type of the Bridge component, link-local
      frames with these destination addresses will be trapped to CPU port:
      
      01-80-C2-00-00-[00,01,02,03,0E]
      
      In a Bridge comprising a MAC Bridge component or a C-VLAN component:
      
        Link-local frames with these destination addresses won't be trapped to
        CPU port which won't conform to IEEE Std 802.1Q-2022:
      
        01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F]
      
      In a Bridge comprising an S-VLAN component:
      
        Link-local frames with these destination addresses will be trapped to CPU
        port which won't conform to IEEE Std 802.1Q-2022:
      
        01-80-C2-00-00-00
      
        Link-local frames with these destination addresses won't be trapped to
        CPU port which won't conform to IEEE Std 802.1Q-2022:
      
        01-80-C2-00-00-[04,05,06,07,08,09,0A]
      
      Currently on this switch intellectual property, if the spanning tree Port
      State of the reception Port is discarding, link-local frames will be
      discarded.
      
      To trap link-local frames regardless of the spanning tree Port State, make
      the switch regard them as Bridge Protocol Data Units (BPDUs). This switch
      intellectual property only lets the frames regarded as BPDUs bypass the
      spanning tree Port State function of the Forwarding Process.
      
      With this change, the only remaining interference is the ingress rules.
      When the reception Port has no PVID assigned on software, VLAN-untagged
      frames won't be allowed in. There doesn't seem to be a mechanism on the
      switch intellectual property to have link-local frames bypass this function
      of the Forwarding Process.
      
      Fixes: b8f126a8
      
       ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Reviewed-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Link: https://lore.kernel.org/r/20240409-b4-for-net-mt7530-fix-link-local-when-stp-discarding-v2-1-07b1150164ac@arinc9.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      17c56011
    • Gerd Bayer's avatar
      Revert "s390/ism: fix receive message buffer allocation" · d51dc8dd
      Gerd Bayer authored
      This reverts commit 58effa34
      
      .
      Review was not finished on this patch. So it's not ready for
      upstreaming.
      
      Signed-off-by: default avatarGerd Bayer <gbayer@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240409113753.21813684-1-gbayer@linux.ibm.com
      Fixes: 58effa34
      
       ("s390/ism: fix receive message buffer allocation")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d51dc8dd
    • Daniel Machon's avatar
      net: sparx5: fix wrong config being used when reconfiguring PCS · 33623113
      Daniel Machon authored
      The wrong port config is being used if the PCS is reconfigured. Fix this
      by correctly using the new config instead of the old one.
      
      Fixes: 946e7fd5
      
       ("net: sparx5: add port module support")
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240409-link-mode-reconfiguration-fix-v2-1-db6a507f3627@microchip.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      33623113
    • Arnd Bergmann's avatar
      net/mlx5: fix possible stack overflows · fe87922c
      Arnd Bergmann authored
      A couple of debug functions use a 512 byte temporary buffer and call another
      function that has another buffer of the same size, which in turn exceeds the
      usual warning limit for excessive stack usage:
      
      drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:1073:1: error: stack frame size (1448) exceeds limit (1024) in 'dr_dump_start' [-Werror,-Wframe-larger-than]
      dr_dump_start(struct seq_file *file, loff_t *pos)
      drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:1009:1: error: stack frame size (1120) exceeds limit (1024) in 'dr_dump_domain' [-Werror,-Wframe-larger-than]
      dr_dump_domain(struct seq_file *file, struct mlx5dr_domain *dmn)
      drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:705:1: error: stack frame size (1104) exceeds limit (1024) in 'dr_dump_matcher_rx_tx' [-Werror,-Wframe-larger-than]
      dr_dump_matcher_rx_tx(struct seq_file *file, bool is_rx,
      
      Rework these so that each of the various code paths only ever has one of
      these buffers in it, and exactly the functions that declare one have
      the 'noinline_for_stack' annotation that prevents them from all being
      inlined into the same caller.
      
      Fixes: 917d1e79
      
       ("net/mlx5: DR, Change SWS usage to debug fs seq_file interface")
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/all/20240219100506.648089-1-arnd@kernel.org/
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240408074142.3007036-1-arnd@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fe87922c
    • Jakub Kicinski's avatar
      Merge branch 'mlx5-misc-fixes' · 186abfcd
      Jakub Kicinski authored
      Tariq Toukan says:
      
      ====================
      mlx5 misc fixes
      
      This patchset provides bug fixes to mlx5 driver.
      
      This is V2 of the series previously submitted as PR by Saeed:
      https://lore.kernel.org/netdev/20240326144646.2078893-1-saeed@kernel.org/T/
      
      Series generated against:
      commit 237f3cf1
      
       ("xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING")
      ====================
      
      Link: https://lore.kernel.org/r/20240409190820.227554-1-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      186abfcd
    • Tariq Toukan's avatar
      net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev · 7772dc74
      Tariq Toukan authored
      Adaptations need to be made for the auxiliary device management in the
      core driver level. Block this combination for now.
      
      Fixes: 678eb448
      
       ("net/mlx5: SD, Implement basic query and instantiation")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-12-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7772dc74
    • Carolina Jubran's avatar
      net/mlx5e: RSS, Block XOR hash with over 128 channels · 49e6c938
      Carolina Jubran authored
      When supporting more than 128 channels, the RQT size is
      calculated by multiplying the number of channels by 2
      and rounding up to the nearest power of 2.
      
      The index of the RQT is derived from the RSS hash
      calculations. If XOR8 is used as the RSS hash function,
      there are only 256 possible hash results, and therefore,
      only 256 indexes can be reached in the RQT.
      
      Block setting the RSS hash function to XOR when the number
      of channels exceeds 128.
      
      Fixes: 74a8dada
      
       ("net/mlx5e: Preparations for supporting larger number of channels")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-11-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      49e6c938
    • Rahul Rameshbabu's avatar
      net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit · 86b0ca5b
      Rahul Rameshbabu authored
      Free Tx port timestamping metadata entries in the NAPI poll context and
      consume metadata enties in the WQE xmit path. Do not free a Tx port
      timestamping metadata entry in the WQE xmit path even in the error path to
      avoid a race between two metadata entry producers.
      
      Fixes: 3178308a
      
       ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-10-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      86b0ca5b
    • Carolina Jubran's avatar
      net/mlx5e: HTB, Fix inconsistencies with QoS SQs number · 2f436f18
      Carolina Jubran authored
      When creating a new HTB class while the interface is down,
      the variable that follows the number of QoS SQs (htb_max_qos_sqs)
      may not be consistent with the number of HTB classes.
      
      Previously, we compared these two values to ensure that
      the node_qid is lower than the number of QoS SQs, and we
      allocated stats for that SQ when they are equal.
      
      Change the check to compare the node_qid with the current
      number of leaf nodes and fix the checking conditions to
      ensure allocation of stats_list and stats for each node.
      
      Fixes: 214baf22
      
       ("net/mlx5e: Support HTB offload")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-9-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2f436f18
    • Carolina Jubran's avatar
      net/mlx5e: Fix mlx5e_priv_init() cleanup flow · ecb82945
      Carolina Jubran authored
      When mlx5e_priv_init() fails, the cleanup flow calls mlx5e_selq_cleanup which
      calls mlx5e_selq_apply() that assures that the `priv->state_lock` is held using
      lockdep_is_held().
      
      Acquire the state_lock in mlx5e_selq_cleanup().
      
      Kernel log:
      =============================
      WARNING: suspicious RCU usage
      6.8.0-rc3_net_next_841a9b5 #1 Not tainted
      -----------------------------
      drivers/net/ethernet/mellanox/mlx5/core/en/selq.c:124 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by systemd-modules/293:
       #0: ffffffffa05067b0 (devices_rwsem){++++}-{3:3}, at: ib_register_client+0x109/0x1b0 [ib_core]
       #1: ffff8881096c65c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x104/0x1c0 [ib_core]
      
      stack backtrace:
      CPU: 4 PID: 293 Comm: systemd-modules Not tainted 6.8.0-rc3_net_next_841a9b5 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x8a/0xa0
       lockdep_rcu_suspicious+0x154/0x1a0
       mlx5e_selq_apply+0x94/0xa0 [mlx5_core]
       mlx5e_selq_cleanup+0x3a/0x60 [mlx5_core]
       mlx5e_priv_init+0x2be/0x2f0 [mlx5_core]
       mlx5_rdma_setup_rn+0x7c/0x1a0 [mlx5_core]
       rdma_init_netdev+0x4e/0x80 [ib_core]
       ? mlx5_rdma_netdev_free+0x70/0x70 [mlx5_core]
       ipoib_intf_init+0x64/0x550 [ib_ipoib]
       ipoib_intf_alloc+0x4e/0xc0 [ib_ipoib]
       ipoib_add_one+0xb0/0x360 [ib_ipoib]
       add_client_context+0x112/0x1c0 [ib_core]
       ib_register_client+0x166/0x1b0 [ib_core]
       ? 0xffffffffa0573000
       ipoib_init_module+0xeb/0x1a0 [ib_ipoib]
       do_one_initcall+0x61/0x250
       do_init_module+0x8a/0x270
       init_module_from_file+0x8b/0xd0
       idempotent_init_module+0x17d/0x230
       __x64_sys_finit_module+0x61/0xb0
       do_syscall_64+0x71/0x140
       entry_SYSCALL_64_after_hwframe+0x46/0x4e
       </TASK>
      
      Fixes: 8bf30be7
      
       ("net/mlx5e: Introduce select queue parameters")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-8-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ecb82945
    • Carolina Jubran's avatar
      net/mlx5e: RSS, Block changing channels number when RXFH is configured · ee357240
      Carolina Jubran authored
      Changing the channels number after configuring the receive flow hash
      indirection table may affect the RSS table size. The previous
      configuration may no longer be compatible with the new receive flow
      hash indirection table.
      
      Block changing the channels number when RXFH is configured and changing
      the channels number requires resizing the RSS table size.
      
      Fixes: 74a8dada
      
       ("net/mlx5e: Preparations for supporting larger number of channels")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-7-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ee357240
    • Cosmin Ratiu's avatar
      net/mlx5: Correctly compare pkt reformat ids · 9eca93f4
      Cosmin Ratiu authored
      struct mlx5_pkt_reformat contains a naked union of a u32 id and a
      dr_action pointer which is used when the action is SW-managed (when
      pkt_reformat.owner is set to MLX5_FLOW_RESOURCE_OWNER_SW). Using id
      directly in that case is incorrect, as it maps to the least significant
      32 bits of the 64-bit pointer in mlx5_fs_dr_action and not to the pkt
      reformat id allocated in firmware.
      
      For the purpose of comparing whether two rules are identical,
      interpreting the least significant 32 bits of the mlx5_fs_dr_action
      pointer as an id mostly works... until it breaks horribly and produces
      the outcome described in [1].
      
      This patch fixes mlx5_flow_dests_cmp to correctly compare ids using
      mlx5_fs_dr_action_get_pkt_reformat_id for the SW-managed rules.
      
      Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u [1]
      
      Fixes: 6a48faee
      
       ("net/mlx5: Add direct rule fs_cmd implementation")
      Signed-off-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-6-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9eca93f4
    • Cosmin Ratiu's avatar
      net/mlx5: Properly link new fs rules into the tree · 7c6782ad
      Cosmin Ratiu authored
      Previously, add_rule_fg would only add newly created rules from the
      handle into the tree when they had a refcount of 1. On the other hand,
      create_flow_handle tries hard to find and reference already existing
      identical rules instead of creating new ones.
      
      These two behaviors can result in a situation where create_flow_handle
      1) creates a new rule and references it, then
      2) in a subsequent step during the same handle creation references it
         again,
      resulting in a rule with a refcount of 2 that is not linked into the
      tree, will have a NULL parent and root and will result in a crash when
      the flow group is deleted because del_sw_hw_rule, invoked on rule
      deletion, assumes node->parent is != NULL.
      
      This happened in the wild, due to another bug related to incorrect
      handling of duplicate pkt_reformat ids, which lead to the code in
      create_flow_handle incorrectly referencing a just-added rule in the same
      flow handle, resulting in the problem described above. Full details are
      at [1].
      
      This patch changes add_rule_fg to add new rules without parents into
      the tree, properly initializing them and avoiding the crash. This makes
      it more consistent with how rules are added to an FTE in
      create_flow_handle.
      
      Fixes: 74491de9
      
       ("net/mlx5: Add multi dest support")
      Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u [1]
      Signed-off-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-5-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7c6782ad
    • Michael Liang's avatar
      net/mlx5: offset comp irq index in name by one · 9f7e8fbb
      Michael Liang authored
      The mlx5 comp irq name scheme is changed a little bit between
      commit 3663ad34 ("net/mlx5: Shift control IRQ to the last index")
      and commit 3354822c ("net/mlx5: Use dynamic msix vectors allocation").
      The index in the comp irq name used to start from 0 but now it starts
      from 1. There is nothing critical here, but it's harmless to change
      back to the old behavior, a.k.a starting from 0.
      
      Fixes: 3354822c
      
       ("net/mlx5: Use dynamic msix vectors allocation")
      Reviewed-by: default avatarMohamed Khalfella <mkhalfella@purestorage.com>
      Reviewed-by: default avatarYuanyuan Zhong <yzhong@purestorage.com>
      Signed-off-by: default avatarMichael Liang <mliang@purestorage.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-4-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f7e8fbb
    • Shay Drory's avatar
      net/mlx5: Register devlink first under devlink lock · c6e77aa9
      Shay Drory authored
      In case device is having a non fatal FW error during probe, the
      driver will report the error to user via devlink. This will trigger
      a WARN_ON, since mlx5 is calling devlink_register() last.
      In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
      first under devlink lock.
      
      [1]
      WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
      CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
      RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
      Call Trace:
       <TASK>
       ? __warn+0x79/0x120
       ? devlink_recover_notify.constprop.0+0xb8/0xc0
       ? report_bug+0x17c/0x190
       ? handle_bug+0x3c/0x60
       ? exc_invalid_op+0x14/0x70
       ? asm_exc_invalid_op+0x16/0x20
       ? devlink_recover_notify.constprop.0+0xb8/0xc0
       devlink_health_report+0x4a/0x1c0
       mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
       process_one_work+0x1bb/0x3c0
       ? process_one_work+0x3c0/0x3c0
       worker_thread+0x4d/0x3c0
       ? process_one_work+0x3c0/0x3c0
       kthread+0xc6/0xf0
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x1f/0x30
       </TASK>
      
      Fixes: cf530217
      
       ("devlink: Notify users when objects are accessible")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6e77aa9
    • Shay Drory's avatar
      net/mlx5: E-switch, store eswitch pointer before registering devlink_param · 0553e753
      Shay Drory authored
      
      
      Next patch will move devlink register to be first. Therefore, whenever
      mlx5 will register a param, the user will be notified.
      In order to notify the user, devlink is using the get() callback of
      the param. Hence, resources that are being used by the get() callback
      must be set before the devlink param is registered.
      
      Therefore, store eswitch pointer inside mdev before registering the
      param.
      
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240409190820.227554-2-tariqt@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0553e753
    • Eric Dumazet's avatar
      netfilter: complete validation of user input · 65acf6e0
      Eric Dumazet authored
      In my recent commit, I missed that do_replace() handlers
      use copy_from_sockptr() (which I fixed), followed
      by unsafe copy_from_sockptr_offset() calls.
      
      In all functions, we can perform the @optlen validation
      before even calling xt_alloc_table_info() with the following
      check:
      
      if ((u64)optlen < (u64)tmp.size + sizeof(tmp))
              return -EINVAL;
      
      Fixes: 0c83842d
      
       ("netfilter: validate user input for expected length")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20240409120741.3538135-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      65acf6e0
    • Heiner Kallweit's avatar
      r8169: add missing conditional compiling for call to r8169_remove_leds · 97e176fc
      Heiner Kallweit authored
      Add missing dependency on CONFIG_R8169_LEDS. As-is a link error occurs
      if config option CONFIG_R8169_LEDS isn't enabled.
      
      Fixes: 19fa4f2a
      
       ("r8169: fix LED-related deadlock on module removal")
      Reported-by: default avatarVenkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Tested-By: default avatarVenkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
      Link: https://lore.kernel.org/r/d080038c-eb6b-45ac-9237-b8c1cdd7870f@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      97e176fc
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards · 06dfcd40
      Arınç ÜNAL authored
      The commit 40b5d2f1 ("net: dsa: mt7530: Add support for EEE features")
      brought EEE support but did not enable EEE on MT7531 switch MACs. EEE is
      enabled on MT7531 switch MACs by pulling the LAN2LED0 pin low on the board
      (bootstrapping), unsetting the EEE_DIS bit on the trap register, or setting
      the internal EEE switch bit on the CORE_PLL_GROUP4 register. Thanks to
      SkyLake Huang (黃啟澤) from MediaTek for providing information on the
      internal EEE switch bit.
      
      There are existing boards that were not designed to pull the pin low.
      Because of that, the EEE status currently depends on the board design.
      
      The EEE_DIS bit on the trap pertains to the LAN2LED0 pin which is usually
      used to control an LED. Once the bit is unset, the pin will be low. That
      will make the active low LED turn on. The pin is controlled by the switch
      PHY. It seems that the PHY controls the pin in the way that it inverts the
      pin state. That means depending on the wiring of the LED connected to
      LAN2LED0 on the board, the LED may be on without an active link.
      
      To not cause this unwanted behaviour whilst enabling EEE on all boards, set
      the internal EEE switch bit on the CORE_PLL_GROUP4 register.
      
      My testing on MT7531 shows a certain amount of traffic loss when EEE is
      enabled. That said, I haven't come across a board that enables EEE. So
      enable EEE on the switch MACs but disable EEE advertisement on the switch
      PHYs. This way, we don't change the behaviour of the majority of the boards
      that have this switch. The mediatek-ge PHY driver already disables EEE
      advertisement on the switch PHYs but my testing shows that it is somehow
      enabled afterwards. Disabling EEE advertisement before the PHY driver
      initialises keeps it off.
      
      With this change, EEE can now be enabled using ethtool.
      
      Fixes: 40b5d2f1
      
       ("net: dsa: mt7530: Add support for EEE features")
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Tested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Link: https://lore.kernel.org/r/20240408-for-net-mt7530-fix-eee-for-mt7531-mt7988-v3-1-84fdef1f008b@arinc9.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      06dfcd40
    • Archie Pusaka's avatar
      Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit · 600b0bbe
      Archie Pusaka authored
      The bit is set and tested inside mgmt_device_connected(), therefore we
      must not set it just outside the function.
      
      Fixes: eeda1bf9
      
       ("Bluetooth: hci_event: Fix not indicating new connection for BIG Sync")
      Signed-off-by: default avatarArchie Pusaka <apusaka@chromium.org>
      Reviewed-by: default avatarManish Mandlik <mmandlik@chromium.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      600b0bbe
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sock: Fix not validating setsockopt user input · b2186061
      Luiz Augusto von Dentz authored
      Check user input length before copying data.
      
      Fixes: 09572fca
      
       ("Bluetooth: hci_sock: Add support for BT_{SND,RCV}BUF")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      b2186061
    • Luiz Augusto von Dentz's avatar
      Bluetooth: ISO: Fix not validating setsockopt user input · 9e8742cd
      Luiz Augusto von Dentz authored
      Check user input length before copying data.
      
      Fixes: ccf74f23 ("Bluetooth: Add BTPROTO_ISO socket type")
      Fixes: 0731c5ab ("Bluetooth: ISO: Add support for BT_PKT_STATUS")
      Fixes: f764a6c2
      
       ("Bluetooth: ISO: Add broadcast support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      9e8742cd
    • Luiz Augusto von Dentz's avatar
      Bluetooth: L2CAP: Fix not validating setsockopt user input · 4f395124
      Luiz Augusto von Dentz authored
      Check user input length before copying data.
      
      Fixes: 33575df7 ("Bluetooth: move l2cap_sock_setsockopt() to l2cap_sock.c")
      Fixes: 3ee7b7cd
      
       ("Bluetooth: Add BT_MODE socket option")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      4f395124
    • Luiz Augusto von Dentz's avatar
      Bluetooth: RFCOMM: Fix not validating setsockopt user input · a97de7bf
      Luiz Augusto von Dentz authored
      syzbot reported rfcomm_sock_setsockopt_old() is copying data without
      checking user input length.
      
      BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
      include/linux/sockptr.h:49 [inline]
      BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
      include/linux/sockptr.h:55 [inline]
      BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt_old
      net/bluetooth/rfcomm/sock.c:632 [inline]
      BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt+0x893/0xa70
      net/bluetooth/rfcomm/sock.c:673
      Read of size 4 at addr ffff8880209a8bc3 by task syz-executor632/5064
      
      Fixes: 9f2c8a03 ("Bluetooth: Replace RFCOMM link mode with security level")
      Fixes: bb23c0ab
      
       ("Bluetooth: Add support for deferring RFCOMM connection setup")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      a97de7bf
    • Luiz Augusto von Dentz's avatar
      Bluetooth: SCO: Fix not validating setsockopt user input · 51eda36d
      Luiz Augusto von Dentz authored
      syzbot reported sco_sock_setsockopt() is copying data without
      checking user input length.
      
      BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
      include/linux/sockptr.h:49 [inline]
      BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
      include/linux/sockptr.h:55 [inline]
      BUG: KASAN: slab-out-of-bounds in sco_sock_setsockopt+0xc0b/0xf90
      net/bluetooth/sco.c:893
      Read of size 4 at addr ffff88805f7b15a3 by task syz-executor.5/12578
      
      Fixes: ad10b1a4 ("Bluetooth: Add Bluetooth socket voice option")
      Fixes: b96e9c67 ("Bluetooth: Add BT_DEFER_SETUP option to sco socket")
      Fixes: 00398e1d ("Bluetooth: Add support for BT_PKT_STATUS CMSG data for SCO connections")
      Fixes: f6873401
      
       ("Bluetooth: Allow setting of codec for HFP offload use case")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      51eda36d
    • Dmitry Antipov's avatar
      Bluetooth: Fix memory leak in hci_req_sync_complete() · 45d355a9
      Dmitry Antipov authored
      
      
      In 'hci_req_sync_complete()', always free the previous sync
      request state before assigning reference to a new one.
      
      Reported-by: default avatar <syzbot+39ec16ff6cc18b1d066d@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=39ec16ff6cc18b1d066d
      Cc: stable@vger.kernel.org
      Fixes: f60cb305
      
       ("Bluetooth: Convert hci_req_sync family of function to new request API")
      Signed-off-by: default avatarDmitry Antipov <dmantipov@yandex.ru>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      45d355a9
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY · 53cb4197
      Luiz Augusto von Dentz authored
      Coded PHY recommended intervals are 3 time bigger than the 1M PHY so
      this aligns with that by multiplying by 3 the values given to 1M PHY
      since the code already used recommended values for that.
      
      Fixes: 288c9022
      
       ("Bluetooth: Enable all supported LE PHY by default")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      53cb4197
    • Luiz Augusto von Dentz's avatar
      Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset · b37cab58
      Luiz Augusto von Dentz authored
      Consider certain values (0x00) as unset and load proper default if
      an application has not set them properly.
      
      Fixes: 0fe8c8d0
      
       ("Bluetooth: Split bt_iso_qos into dedicated structures")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      b37cab58
  2. Apr 10, 2024
    • Heiner Kallweit's avatar
      r8169: fix LED-related deadlock on module removal · 19fa4f2a
      Heiner Kallweit authored
      Binding devm_led_classdev_register() to the netdev is problematic
      because on module removal we get a RTNL-related deadlock. Fix this
      by avoiding the device-managed LED functions.
      
      Note: We can safely call led_classdev_unregister() for a LED even
      if registering it failed, because led_classdev_unregister() detects
      this and is a no-op in this case.
      
      Fixes: 18764b88
      
       ("r8169: add support for LED's on RTL8168/RTL8101")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19fa4f2a
    • Brett Creeley's avatar
      pds_core: Fix pdsc_check_pci_health function to use work thread · 81665adf
      Brett Creeley authored
      When the driver notices fw_status == 0xff it tries to perform a PCI
      reset on itself via pci_reset_function() in the context of the driver's
      health thread. However, pdsc_reset_prepare calls
      pdsc_stop_health_thread(), which attempts to stop/flush the health
      thread. This results in a deadlock because the stop/flush will never
      complete since the driver called pci_reset_function() from the health
      thread context. Fix by changing the pdsc_check_pci_health_function()
      to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's
      work queue.
      
      Unloading the driver in the fw_down/dead state uncovered another issue,
      which can be seen in the following trace:
      
      WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440
      [...]
      RIP: 0010:__queue_work+0x358/0x440
      [...]
      Call Trace:
       <TASK>
       ? __warn+0x85/0x140
       ? __queue_work+0x358/0x440
       ? report_bug+0xfc/0x1e0
       ? handle_bug+0x3f/0x70
       ? exc_invalid_op+0x17/0x70
       ? asm_exc_invalid_op+0x1a/0x20
       ? __queue_work+0x358/0x440
       queue_work_on+0x28/0x30
       pdsc_devcmd_locked+0x96/0xe0 [pds_core]
       pdsc_devcmd_reset+0x71/0xb0 [pds_core]
       pdsc_teardown+0x51/0xe0 [pds_core]
       pdsc_remove+0x106/0x200 [pds_core]
       pci_device_remove+0x37/0xc0
       device_release_driver_internal+0xae/0x140
       driver_detach+0x48/0x90
       bus_remove_driver+0x6d/0xf0
       pci_unregister_driver+0x2e/0xa0
       pdsc_cleanup_module+0x10/0x780 [pds_core]
       __x64_sys_delete_module+0x142/0x2b0
       ? syscall_trace_enter.isra.18+0x126/0x1a0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7fbd9d03a14b
      [...]
      
      Fix this by preventing the devcmd reset if the FW is not running.
      
      Fixes: d9407ff1
      
       ("pds_core: Prevent health thread from running during reset/remove")
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81665adf
    • Jiri Benc's avatar
      ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr · 7633c4da
      Jiri Benc authored
      Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
      still means hlist_for_each_entry_rcu can return an item that got removed
      from the list. The memory itself of such item is not freed thanks to RCU
      but nothing guarantees the actual content of the memory is sane.
      
      In particular, the reference count can be zero. This can happen if
      ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
      from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
      references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
      timing, this can happen:
      
      1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
      
      2. Then, the whole ipv6_del_addr is executed for the given entry. The
         reference count drops to zero and kfree_rcu is scheduled.
      
      3. ipv6_get_ifaddr continues and tries to increments the reference count
         (in6_ifa_hold).
      
      4. The rcu is unlocked and the entry is freed.
      
      5. The freed entry is returned.
      
      Prevent increasing of the reference count in such case. The name
      in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
      
      [   41.506330] refcount_t: addition on 0; use-after-free.
      [   41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
      [   41.507413] Modules linked in: veth bridge stp llc
      [   41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14
      [   41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      [   41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
      [   41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
      [   41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
      [   41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
      [   41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
      [   41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
      [   41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
      [   41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
      [   41.514086] FS:  00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
      [   41.514726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
      [   41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   41.516799] Call Trace:
      [   41.517037]  <TASK>
      [   41.517249]  ? __warn+0x7b/0x120
      [   41.517535]  ? refcount_warn_saturate+0xa5/0x130
      [   41.517923]  ? report_bug+0x164/0x190
      [   41.518240]  ? handle_bug+0x3d/0x70
      [   41.518541]  ? exc_invalid_op+0x17/0x70
      [   41.520972]  ? asm_exc_invalid_op+0x1a/0x20
      [   41.521325]  ? refcount_warn_saturate+0xa5/0x130
      [   41.521708]  ipv6_get_ifaddr+0xda/0xe0
      [   41.522035]  inet6_rtm_getaddr+0x342/0x3f0
      [   41.522376]  ? __pfx_inet6_rtm_getaddr+0x10/0x10
      [   41.522758]  rtnetlink_rcv_msg+0x334/0x3d0
      [   41.523102]  ? netlink_unicast+0x30f/0x390
      [   41.523445]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
      [   41.523832]  netlink_rcv_skb+0x53/0x100
      [   41.524157]  netlink_unicast+0x23b/0x390
      [   41.524484]  netlink_sendmsg+0x1f2/0x440
      [   41.524826]  __sys_sendto+0x1d8/0x1f0
      [   41.525145]  __x64_sys_sendto+0x1f/0x30
      [   41.525467]  do_syscall_64+0xa5/0x1b0
      [   41.525794]  entry_SYSCALL_64_after_hwframe+0x72/0x7a
      [   41.526213] RIP: 0033:0x7fbc4cfcea9a
      [   41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      [   41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
      [   41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
      [   41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
      [   41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [   41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
      [   41.531573]  </TASK>
      
      Fixes: 5c578aed
      
       ("IPv6: convert addrconf hash list to RCU")
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7633c4da
    • Jakub Kicinski's avatar
      Merge branch 'net-start-to-replace-copy_from_sockptr' · 7b6575c6
      Jakub Kicinski authored
      
      
      Eric Dumazet says:
      
      ====================
      net: start to replace copy_from_sockptr()
      
      We got several syzbot reports about unsafe copy_from_sockptr()
      calls. After fixing some of them, it appears that we could
      use a new helper to factorize all the checks in one place.
      
      This series targets net tree, we can later start converting
      many call sites in net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20240408082845.3957374-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b6575c6
    • Eric Dumazet's avatar
      nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies · 7a87441c
      Eric Dumazet authored
      
      
      syzbot reported unsafe calls to copy_from_sockptr() [1]
      
      Use copy_safe_from_sockptr() instead.
      
      [1]
      
      BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
      Read of size 4 at addr ffff88801caa1ec3 by task syz-executor459/5078
      
      CPU: 0 PID: 5078 Comm: syz-executor459 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
        do_sock_setsockopt+0x3b1/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfd/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7f7fac07fd89
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fff660eb788 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7fac07fd89
      RDX: 0000000000000000 RSI: 0000000000000118 RDI: 0000000000000004
      RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000020000a80 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20240408082845.3957374-4-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a87441c
    • Eric Dumazet's avatar
      mISDN: fix MISDN_TIME_STAMP handling · 138b7878
      Eric Dumazet authored
      syzbot reports one unsafe call to copy_from_sockptr() [1]
      
      Use copy_safe_from_sockptr() instead.
      
      [1]
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417
      Read of size 4 at addr ffff0000c6d54083 by task syz-executor406/6167
      
      CPU: 1 PID: 6167 Comm: syz-executor406 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call trace:
        dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291
        show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x178/0x518 mm/kasan/report.c:488
        kasan_report+0xd8/0x138 mm/kasan/report.c:601
        __asan_report_load_n_noabort+0x1c/0x28 mm/kasan/report_generic.c:391
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417
        do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2311
        __sys_setsockopt+0x128/0x1a8 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2340
        __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
        invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48
        el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133
        do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152
        el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
        el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
        el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
      
      Fixes: 1b2b03f8
      
       ("Add mISDN core files")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Link: https://lore.kernel.org/r/20240408082845.3957374-3-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      138b7878
    • Eric Dumazet's avatar
      net: add copy_safe_from_sockptr() helper · 6309863b
      Eric Dumazet authored
      
      
      copy_from_sockptr() helper is unsafe, unless callers
      did the prior check against user provided optlen.
      
      Too many callers get this wrong, lets add a helper to
      fix them and avoid future copy/paste bugs.
      
      Instead of :
      
         if (optlen < sizeof(opt)) {
             err = -EINVAL;
             break;
         }
         if (copy_from_sockptr(&opt, optval, sizeof(opt)) {
             err = -EFAULT;
             break;
         }
      
      Use :
      
         err = copy_safe_from_sockptr(&opt, sizeof(opt),
                                      optval, optlen);
         if (err)
             break;
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408082845.3957374-2-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6309863b