Skip to content
  1. Apr 18, 2024
  2. Apr 17, 2024
    • Gerd Bayer's avatar
      s390/ism: Properly fix receive message buffer allocation · 83781384
      Gerd Bayer authored
      Since [1], dma_alloc_coherent() does not accept requests for GFP_COMP
      anymore, even on archs that may be able to fulfill this. Functionality that
      relied on the receive buffer being a compound page broke at that point:
      The SMC-D protocol, that utilizes the ism device driver, passes receive
      buffers to the splice processor in a struct splice_pipe_desc with a
      single entry list of struct pages. As the buffer is no longer a compound
      page, the splice processor now rejects requests to handle more than a
      page worth of data.
      
      Replace dma_alloc_coherent() and allocate a buffer with folio_alloc and
      create a DMA map for it with dma_map_page(). Since only receive buffers
      on ISM devices use DMA, qualify the mapping as FROM_DEVICE.
      Since ISM devices are available on arch s390, only, and on that arch all
      DMA is coherent, there is no need to introduce and export some kind of
      dma_sync_to_cpu() method to be called by the SMC-D protocol layer.
      
      Analogously, replace dma_free_coherent by a two step dma_unmap_page,
      then folio_put to free the receive buffer.
      
      [1] https://lore.kernel.org/all/20221113163535.884299-1-hch@lst.de/
      
      Fixes: c08004ee
      
       ("s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent")
      Signed-off-by: default avatarGerd Bayer <gbayer@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83781384
    • David S. Miller's avatar
      Merge branch 'mt7530-fixes' · cb178ccb
      David S. Miller authored
      
      
      Merge branch 'mr7530-fixes'
      
      Arınç ÜNAL says:
      
      ====================
      Fix port mirroring on MT7530 DSA subdriver
      
      This patch series fixes the frames received on the local port (monitor
      port) not being mirrored, and port mirroring for the MT7988 SoC switch.
      ====================
      
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      cb178ccb
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix port mirroring for MT7988 SoC switch · 2c606d13
      Arınç ÜNAL authored
      The "MT7988A Wi-Fi 7 Generation Router Platform: Datasheet (Open Version)
      v0.1" document shows bits 16 to 18 as the MIRROR_PORT field of the CPU
      forward control register. Currently, the MT7530 DSA subdriver configures
      bits 0 to 2 of the CPU forward control register which breaks the port
      mirroring feature for the MT7988 SoC switch.
      
      Fix this by using the MT7531_MIRROR_PORT_GET() and MT7531_MIRROR_PORT_SET()
      macros which utilise the correct bits.
      
      Fixes: 110c18bf
      
       ("net: dsa: mt7530: introduce driver for MT7988 built-in switch")
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Acked-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c606d13
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix mirroring frames received on local port · d59cf049
      Arınç ÜNAL authored
      This switch intellectual property provides a bit on the ARL global control
      register which controls allowing mirroring frames which are received on the
      local port (monitor port). This bit is unset after reset.
      
      This ability must be enabled to fully support the port mirroring feature on
      this switch intellectual property.
      
      Therefore, this patch fixes the traffic not being reflected on a port,
      which would be configured like below:
      
        tc qdisc add dev swp0 clsact
      
        tc filter add dev swp0 ingress matchall skip_sw \
        action mirred egress mirror dev swp0
      
      As a side note, this configuration provides the hairpinning feature for a
      single port.
      
      Fixes: 37feab60
      
       ("net: dsa: mt7530: add support for port mirroring")
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d59cf049
    • Lei Chen's avatar
      tun: limit printing rate when illegal packet received by tun dev · f8bbc07a
      Lei Chen authored
      vhost_worker will call tun call backs to receive packets. If too many
      illegal packets arrives, tun_do_read will keep dumping packet contents.
      When console is enabled, it will costs much more cpu time to dump
      packet and soft lockup will be detected.
      
      net_ratelimit mechanism can be used to limit the dumping rate.
      
      PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
       #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
       #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
       #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
       #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
       #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
          [exception RIP: io_serial_in+20]
          RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
          RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
          RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
          RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
          R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
          R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
       #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
       #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
       #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
       #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
       #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
       #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
       #12 [ffffa65531497b68] printk at ffffffff89318306
       #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
       #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
       #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
       #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
       #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
       #18 [ffffa65531497f10] kthread at ffffffff892d2e72
       #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f
      
      Fixes: ef3db4a5
      
       ("tun: avoid BUG, dump packet on GSO errors")
      Signed-off-by: default avatarLei Chen <lei.chen@smartx.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f8bbc07a
    • Marcin Szycik's avatar
      ice: Fix checking for unsupported keys on non-tunnel device · 2cca35f5
      Marcin Szycik authored
      Add missing FLOW_DISSECTOR_KEY_ENC_* checks to TC flower filter parsing.
      Without these checks, it would be possible to add filters with tunnel
      options on non-tunnel devices. enc_* options are only valid for tunnel
      devices.
      
      Example:
        devlink dev eswitch set $PF1_PCI mode switchdev
        echo 1 > /sys/class/net/$PF1/device/sriov_numvfs
        tc qdisc add dev $VF1_PR ingress
        ethtool -K $PF1 hw-tc-offload on
        tc filter add dev $VF1_PR ingress flower enc_ttl 12 skip_sw action drop
      
      Fixes: 9e300987
      
       ("ice: VXLAN and Geneve TC support")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarSujai Buvaneswaran <sujai.buvaneswaran@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      2cca35f5
    • Michal Swiatkowski's avatar
      ice: tc: allow zero flags in parsing tc flower · 73278715
      Michal Swiatkowski authored
      The check for flags is done to not pass empty lookups to adding switch
      rule functions. Since metadata is always added to lookups there is no
      need to check against the flag.
      
      It is also fixing the problem with such rule:
      $ tc filter add dev gtp_dev ingress protocol ip prio 0 flower \
      	enc_dst_port 2123 action drop
      Switch block in case of GTP can't parse the destination port, because it
      should always be set to GTP specific value. The same with ethertype. The
      result is that there is no other matching criteria than GTP tunnel. In
      this case flags is 0, rule can't be added only because of defensive
      check against flags.
      
      Fixes: 9a225f81
      
       ("ice: Support GTP-U and GTP-C offload in switchdev")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarSujai Buvaneswaran <sujai.buvaneswaran@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      73278715
    • Michal Swiatkowski's avatar
      ice: tc: check src_vsi in case of traffic from VF · 42805160
      Michal Swiatkowski authored
      In case of traffic going from the VF (so ingress for port representor)
      source VSI should be consider during packet classification. It is
      needed for hardware to not match packets from different ports with
      filters added on other port.
      
      It is only for "from VF" traffic, because other traffic direction
      doesn't have source VSI.
      
      Set correct ::src_vsi in rule_info to pass it to the hardware filter.
      
      For example this rule should drop only ipv4 packets from eth10, not from
      the others VF PRs. It is needed to check source VSI in this case.
      $tc filter add dev eth10 ingress protocol ip flower skip_sw action drop
      
      Fixes: 0d08a441
      
       ("ice: ndo_setup_tc implementation for PF")
      Reviewed-by: default avatarJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Reviewed-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarSujai Buvaneswaran <sujai.buvaneswaran@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      42805160
  3. Apr 16, 2024
    • Paolo Abeni's avatar
      Merge branch 'net-stmmac-fix-mac-capabilities-procedure' · e226eade
      Paolo Abeni authored
      Serge Semin says:
      
      ====================
      net: stmmac: Fix MAC-capabilities procedure
      
      The series got born as a result of the discussions around the recent
      Yanteng' series adding the Loongson LS7A1000, LS2K1000, LS7A2000, LS2K2000
      MACs support:
      Link: https://lore.kernel.org/netdev/fu3f6uoakylnb6eijllakeu5i4okcyqq7sfafhp5efaocbsrwe@w74xe7gb6x7p
      
      In particular the Yanteng' patchset needed to implement the Loongson
      MAC-specific constraints applied to the link speed and link duplex mode.
      As a result of the discussion with Russel the next preliminary patch was
      born:
      Link: https://lore.kernel.org/netdev/df31e8bcf74b3b4ddb7ddf5a1c371390f16a2ad5.1712917541.git.siyanteng@loongson.cn
      
      
      
      The patch above was a temporal solution utilized by Yanteng for further
      developments and to move on with the on-going review. This patchset is a
      refactored version of that single patch with formatting required for the
      fixes patches.
      
      In particular the series starts with fixing the half-duplex-less
      constraint currently applied for all IP-cores. In fact it's specific for
      the DW QoS Eth only (DW GMAC v4.x/v5.x).
      
      The next patch fixes the MAC-capabilities setting up during the active
      Tx/Rx queues re-initialization procedure. Particularly the procedure
      missed the max-speed limit thus possibly activating speeds prohibited on
      the respective platforms.
      
      Third patch fixes the incorrect MAC-capabilities initialization for DW
      MAC100, DW XGMAC and DW XLGMAC devices by moving the correct
      initialization to the IP-core specific setup() methods.
      
      That's it for now. Thanks for review and testing in advance.
      
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Chen-Yu Tsai <wens@csie.org>
      Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
      Cc: Samuel Holland <samuel@sholland.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-stm32@st-md-mailman.stormreply.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-sunxi@lists.linux.dev
      Cc: linux-kernel@vger.kernel.org
      ====================
      
      Link: https://lore.kernel.org/r/20240412180340.7965-1-fancer.lancer@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e226eade
    • Serge Semin's avatar
      net: stmmac: Fix IP-cores specific MAC capabilities · 9cb54af2
      Serge Semin authored
      Here is the list of the MAC capabilities specific to the particular DW MAC
      IP-cores currently supported by the driver:
      
      DW MAC100: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
      	   MAC_10 | MAC_100
      
      DW GMAC:  MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                MAC_10 | MAC_100 | MAC_1000
      
      Allwinner sun8i MAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                           MAC_10 | MAC_100 | MAC_1000
      
      DW QoS Eth: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                  MAC_10 | MAC_100 | MAC_1000 | MAC_2500FD
      if there is more than 1 active Tx/Rx queues:
      	   MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                 MAC_10FD | MAC_100FD | MAC_1000FD | MAC_2500FD
      
      DW XGMAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                MAC_1000FD | MAC_2500FD | MAC_5000FD | MAC_10000FD
      
      DW XLGMAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                MAC_1000FD | MAC_2500FD | MAC_5000FD | MAC_10000FD |
                MAC_25000FD | MAC_40000FD | MAC_50000FD | MAC_100000FD
      
      As you can see there are only two common capabilities:
      MAC_ASYM_PAUSE | MAC_SYM_PAUSE.
      Meanwhile what is currently implemented defines 10/100/1000 link speeds
      for all IP-cores, which is definitely incorrect for DW MAC100, DW XGMAC
      and DW XLGMAC devices.
      
      Seeing the flow-control is implemented as a callback for each MAC IP-core
      (see dwmac100_flow_ctrl(), dwmac1000_flow_ctrl(), sun8i_dwmac_flow_ctrl(),
      etc) and since the MAC-specific setup() method is supposed to be called
      for each available DW MAC-based device, the capabilities initialization
      can be freely moved to these setup() functions, thus correctly setting up
      the MAC-capabilities for each IP-core (including the Allwinner Sun8i). A
      new stmmac_link::caps field was specifically introduced for that so to
      have all link-specific info preserved in a single structure.
      
      Note the suggested change fixes three earlier commits at a time. The
      commit 5b0d7d7d ("net: stmmac: Add the missing speeds that XGMAC
      supports") permitted the 10-100 link speeds and 1G half-duplex mode for DW
      XGMAC IP-core even though it doesn't support them. The commit df7699c7
      ("net: stmmac: Do not cut down 1G modes") incorrectly added the MAC1000
      capability to the DW MAC100 IP-core. Similarly to the DW XGMAC the commit
      8a880936 ("net: stmmac: Add XLGMII support") incorrectly permitted the
      10-100 link speeds and 1G half-duplex mode for DW XLGMAC IP-core.
      
      Fixes: 5b0d7d7d ("net: stmmac: Add the missing speeds that XGMAC supports")
      Fixes: df7699c7 ("net: stmmac: Do not cut down 1G modes")
      Fixes: 8a880936
      
       ("net: stmmac: Add XLGMII support")
      Suggested-by: default avatarRussell King (Oracle) <linux@armlinux.org.uk>
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9cb54af2
    • Serge Semin's avatar
      net: stmmac: Fix max-speed being ignored on queue re-init · 59c3d6ca
      Serge Semin authored
      It's possible to have the maximum link speed being artificially limited on
      the platform-specific basis. It's done either by setting up the
      plat_stmmacenet_data::max_speed field or by specifying the "max-speed"
      DT-property. In such cases it's required that any specific
      MAC-capabilities re-initializations would take the limit into account. In
      particular the link speed capabilities may change during the number of
      active Tx/Rx queues re-initialization. But the currently implemented
      procedure doesn't take the speed limit into account.
      
      Fix that by calling phylink_limit_mac_speed() in the
      stmmac_reinit_queues() method if the speed limitation was required in the
      same way as it's done in the stmmac_phy_setup() function.
      
      Fixes: 95201f36
      
       ("net: stmmac: update MAC capabilities when tx queues are updated")
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      59c3d6ca
    • Serge Semin's avatar
      net: stmmac: Apply half-duplex-less constraint for DW QoS Eth only · 0ebd96f5
      Serge Semin authored
      There are three DW MAC IP-cores which can have the multiple Tx/Rx queues
      enabled:
      DW GMAC v3.7+ with AV feature,
      DW QoS Eth v4.x/v5.x,
      DW XGMAC/XLGMAC
      Based on the respective HW databooks, only the DW QoS Eth IP-core doesn't
      support the half-duplex link mode in case if more than one queues enabled:
      
      "In multiple queue/channel configurations, for half-duplex operation,
      enable only the Q0/CH0 on Tx and Rx. For single queue/channel in
      full-duplex operation, any queue/channel can be enabled."
      
      The rest of the IP-cores don't have such constraint. Thus in order to have
      the constraint applied for the DW QoS Eth MACs only, let's move the it'
      implementation to the respective MAC-capabilities getter and make sure the
      getter is called in the queues re-init procedure.
      
      Fixes: b6cfffa7
      
       ("stmmac: fix DMA channel hang in half-duplex mode")
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0ebd96f5
    • Paolo Abeni's avatar
      Merge branch 'selftests-net-tcp_ao-a-bunch-of-fixes-for-tcp-ao-selftests' · 24f4c99e
      Paolo Abeni authored
      
      
      Dmitry Safonov via says:
      
      ====================
      selftests/net/tcp_ao: A bunch of fixes for TCP-AO selftests
      
      Started as addressing the flakiness issues in rst_ipv*, that affect
      netdev dashboard.
      
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240413-tcp-ao-selftests-fixes-v1-0-f9c41c96949d@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      24f4c99e
    • Dmitry Safonov's avatar
      selftests/tcp_ao: Printing fixes to confirm with format-security · b476c936
      Dmitry Safonov authored
      On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces
      > lib/setup.c: In function ‘__test_msg’:
      > lib/setup.c:20:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    20 |         ksft_print_msg(buf);
      >       |         ^~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_ok’:
      > lib/setup.c:26:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    26 |         ksft_test_result_pass(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_fail’:
      > lib/setup.c:32:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    32 |         ksft_test_result_fail(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_xfail’:
      > lib/setup.c:38:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    38 |         ksft_test_result_xfail(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_error’:
      > lib/setup.c:44:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    44 |         ksft_test_result_error(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_skip’:
      > lib/setup.c:50:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    50 |         ksft_test_result_skip(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~
      > cc1: some warnings being treated as errors
      
      As the buffer was already pre-printed into, print it as a string
      rather than a format-string.
      
      Fixes: cfbab37b
      
       ("selftests/net: Add TCP-AO library")
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Reported-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b476c936
    • Dmitry Safonov's avatar
      selftests/tcp_ao: Fix fscanf() call for format-security · beb78cd1
      Dmitry Safonov authored
      On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces:
      > lib/proc.c: In function ‘netstat_read_type’:
      > lib/proc.c:89:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    89 |         if (fscanf(fnetstat, type->header_name) == EOF)
      >       |         ^~
      > cc1: some warnings being treated as errors
      
      Here the selftests lib parses header name, while expectes non-space word
      ending with a column.
      
      Fixes: cfbab37b
      
       ("selftests/net: Add TCP-AO library")
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Reported-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      beb78cd1
    • Dmitry Safonov's avatar
      selftests/tcp_ao: Zero-init tcp_ao_info_opt · b089b3be
      Dmitry Safonov authored
      The structure is on the stack and has to be zero-initialized as
      the kernel checks for:
      >	if (in.reserved != 0 || in.reserved2 != 0)
      >		return -EINVAL;
      
      Fixes: b2666053
      
       ("selftests/net: Add test for TCP-AO add setsockopt() command")
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b089b3be
    • Dmitry Safonov's avatar
      selftests/tcp_ao: Make RST tests less flaky · 4225dfa4
      Dmitry Safonov authored
      Currently, "active reset" cases are flaky, because select() is called
      for 3 sockets, while only 2 are expected to receive RST.
      The idea of the third socket was to get into request_sock_queue,
      but the test mistakenly attempted to connect() after the listener
      socket was shut down.
      
      Repair this test, it's important to check the different kernel
      code-paths for signing RST TCP-AO segments.
      
      Fixes: c6df7b23
      
       ("selftests/net: Add TCP-AO RST test")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4225dfa4
  4. Apr 15, 2024
    • Asbjørn Sloth Tønnesen's avatar
      octeontx2-pf: fix FLOW_DIS_IS_FRAGMENT implementation · 75ce9506
      Asbjørn Sloth Tønnesen authored
      Upon reviewing the flower control flags handling in
      this driver, I notice that the key wasn't being used,
      only the mask.
      
      Ie. `tc flower ... ip_flags nofrag` was hardware
      offloaded as `... ip_flags frag`.
      
      Only compile tested, no access to HW.
      
      Fixes: c672e372
      
       ("octeontx2-pf: Add support to filter packet based on IP fragment")
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75ce9506
    • Jakub Kicinski's avatar
      inet: bring NLM_DONE out to a separate recv() again · 460b0d33
      Jakub Kicinski authored
      
      
      Commit under Fixes optimized the number of recv() calls
      needed during RTM_GETROUTE dumps, but we got multiple
      reports of applications hanging on recv() calls.
      Applications expect that a route dump will be terminated
      with a recv() reading an individual NLM_DONE message.
      
      Coalescing NLM_DONE is perfectly legal in netlink,
      but even tho reporters fixed the code in respective
      projects, chances are it will take time for those
      applications to get updated. So revert to old behavior
      (for now)?
      
      Old kernel (5.19):
      
       $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
                  --dump getroute --json '{"rtm-family": 2}'
       Recv: read 692 bytes, 11 messages
         nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
       ...
         nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
       Recv: read 20 bytes, 1 messages
         nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
      
      Before (6.9-rc2):
      
       $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
                  --dump getroute --json '{"rtm-family": 2}'
       Recv: read 712 bytes, 12 messages
         nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
       ...
         nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
         nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
      
      After:
      
       $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
                  --dump getroute --json '{"rtm-family": 2}'
       Recv: read 692 bytes, 11 messages
         nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
       ...
         nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
       Recv: read 20 bytes, 1 messages
         nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
      
      Reported-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
      
      
      Reported-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
      Fixes: 4ce5dc93
      
       ("inet: switch inet_dump_fib() to RCU protection")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Tested-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      460b0d33
    • Yuri Benditovich's avatar
      net: change maximum number of UDP segments to 128 · 1382e3b6
      Yuri Benditovich authored
      The commit fc8b2a61
      ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
      adds check of potential number of UDP segments vs
      UDP_MAX_SEGMENTS in linux/virtio_net.h.
      After this change certification test of USO guest-to-guest
      transmit on Windows driver for virtio-net device fails,
      for example with packet size of ~64K and mss of 536 bytes.
      In general the USO should not be more restrictive than TSO.
      Indeed, in case of unreasonably small mss a lot of segments
      can cause queue overflow and packet loss on the destination.
      Limit of 128 segments is good for any practical purpose,
      with minimal meaningful mss of 536 the maximal UDP packet will
      be divided to ~120 segments.
      The number of segments for UDP packets is validated vs
      UDP_MAX_SEGMENTS also in udp.c (v4,v6), this does not affect
      quest-to-guest path but does affect packets sent to host, for
      example.
      It is important to mention that UDP_MAX_SEGMENTS is kernel-only
      define and not available to user mode socket applications.
      In order to request MSS smaller than MTU the applications
      just uses setsockopt with SOL_UDP and UDP_SEGMENT and there is
      no limitations on socket API level.
      
      Fixes: fc8b2a61
      
       ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
      Signed-off-by: default avatarYuri Benditovich <yuri.benditovich@daynix.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1382e3b6
  5. Apr 13, 2024
    • Jakub Kicinski's avatar
      Merge branch 'mlx5-fixes' · 72041e53
      Jakub Kicinski authored
      Tariq Toukan says:
      
      ====================
      mlx5 fixes
      
      This patchset provides bug fixes to mlx5 core and Eth drivers.
      ====================
      
      Link: https://lore.kernel.org/r/20240411115444.374475-1-tariqt@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72041e53
    • Carolina Jubran's avatar
      net/mlx5e: Prevent deadlock while disabling aRFS · fef96576
      Carolina Jubran authored
      When disabling aRFS under the `priv->state_lock`, any scheduled
      aRFS works are canceled using the `cancel_work_sync` function,
      which waits for the work to end if it has already started.
      However, while waiting for the work handler, the handler will
      try to acquire the `state_lock` which is already acquired.
      
      The worker acquires the lock to delete the rules if the state
      is down, which is not the worker's responsibility since
      disabling aRFS deletes the rules.
      
      Add an aRFS state variable, which indicates whether the aRFS is
      enabled and prevent adding rules when the aRFS is disabled.
      
      Kernel log:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.7.0-rc4_net_next_mlx5_5483eb2 #1 Tainted: G          I
      ------------------------------------------------------
      ethtool/386089 is trying to acquire lock:
      ffff88810f21ce68 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}, at: __flush_work+0x74/0x4e0
      
      but task is already holding lock:
      ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&priv->state_lock){+.+.}-{3:3}:
             __mutex_lock+0x80/0xc90
             arfs_handle_work+0x4b/0x3b0 [mlx5_core]
             process_one_work+0x1dc/0x4a0
             worker_thread+0x1bf/0x3c0
             kthread+0xd7/0x100
             ret_from_fork+0x2d/0x50
             ret_from_fork_asm+0x11/0x20
      
      -> #0 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}:
             __lock_acquire+0x17b4/0x2c80
             lock_acquire+0xd0/0x2b0
             __flush_work+0x7a/0x4e0
             __cancel_work_timer+0x131/0x1c0
             arfs_del_rules+0x143/0x1e0 [mlx5_core]
             mlx5e_arfs_disable+0x1b/0x30 [mlx5_core]
             mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core]
             ethnl_set_channels+0x28f/0x3b0
             ethnl_default_set_doit+0xec/0x240
             genl_family_rcv_msg_doit+0xd0/0x120
             genl_rcv_msg+0x188/0x2c0
             netlink_rcv_skb+0x54/0x100
             genl_rcv+0x24/0x40
             netlink_unicast+0x1a1/0x270
             netlink_sendmsg+0x214/0x460
             __sock_sendmsg+0x38/0x60
             __sys_sendto+0x113/0x170
             __x64_sys_sendto+0x20/0x30
             do_syscall_64+0x40/0xe0
             entry_SYSCALL_64_after_hwframe+0x46/0x4e
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&priv->state_lock);
                                     lock((work_completion)(&rule->arfs_work));
                                     lock(&priv->state_lock);
        lock((work_completion)(&rule->arfs_work));
      
       *** DEADLOCK ***
      
      3 locks held by ethtool/386089:
       #0: ffffffff82ea7210 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
       #1: ffffffff82e94c88 (rtnl_mutex){+.+.}-{3:3}, at: ethnl_default_set_doit+0xd3/0x240
       #2: ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core]
      
      stack backtrace:
      CPU: 15 PID: 386089 Comm: ethtool Tainted: G          I        6.7.0-rc4_net_next_mlx5_5483eb2 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x60/0xa0
       check_noncircular+0x144/0x160
       __lock_acquire+0x17b4/0x2c80
       lock_acquire+0xd0/0x2b0
       ? __flush_work+0x74/0x4e0
       ? save_trace+0x3e/0x360
       ? __flush_work+0x74/0x4e0
       __flush_work+0x7a/0x4e0
       ? __flush_work+0x74/0x4e0
       ? __lock_acquire+0xa78/0x2c80
       ? lock_acquire+0xd0/0x2b0
       ? mark_held_locks+0x49/0x70
       __cancel_work_timer+0x131/0x1c0
       ? mark_held_locks+0x49/0x70
       arfs_del_rules+0x143/0x1e0 [mlx5_core]
       mlx5e_arfs_disable+0x1b/0x30 [mlx5_core]
       mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core]
       ethnl_set_channels+0x28f/0x3b0
       ethnl_default_set_doit+0xec/0x240
       genl_family_rcv_msg_doit+0xd0/0x120
       genl_rcv_msg+0x188/0x2c0
       ? ethnl_ops_begin+0xb0/0xb0
       ? genl_family_rcv_msg_dumpit+0xf0/0xf0
       netlink_rcv_skb+0x54/0x100
       genl_rcv+0x24/0x40
       netlink_unicast+0x1a1/0x270
       netlink_sendmsg+0x214/0x460
       __sock_sendmsg+0x38/0x60
       __sys_sendto+0x113/0x170
       ? do_user_addr_fault+0x53f/0x8f0
       __x64_sys_sendto+0x20/0x30
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x46/0x4e
       </TASK>
      
      Fixes: 45bf454a
      
       ("net/mlx5e: Enabling aRFS mechanism")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-7-tariqt@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fef96576
    • Carolina Jubran's avatar
      net/mlx5e: Acquire RTNL lock before RQs/SQs activation/deactivation · fdce06bd
      Carolina Jubran authored
      netif_queue_set_napi asserts whether RTNL lock is held if
      the netdev is initialized.
      
      Acquire the RTNL lock before activating or deactivating
      RQs/SQs if the lock has not been held before in the flow.
      
      Fixes: f25e7b82
      
       ("net/mlx5e: link NAPI instances to queues and IRQs")
      Cc: Joe Damato <jdamato@fastly.com>
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-6-tariqt@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fdce06bd
    • Rahul Rameshbabu's avatar
      net/mlx5e: Use channel mdev reference instead of global mdev instance for coalescing · 6c685bdb
      Rahul Rameshbabu authored
      Channels can potentially have independent mdev instances. Do not refer to
      the global mdev instance in the mlx5e_priv instance for channel FW
      operations related to coalescing. CQ numbers that would be valid on the
      channel's mdev instance may not be correctly referenced if using the
      mlx5e_priv instance.
      
      Fixes: 67936e13
      
       ("net/mlx5e: Let channels be SD-aware")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-5-tariqt@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c685bdb
    • Shay Drory's avatar
      net/mlx5: Restore mistakenly dropped parts in register devlink flow · bf729988
      Shay Drory authored
      Code parts from cited commit were mistakenly dropped while rebasing
      before submission. Add them here.
      
      Fixes: c6e77aa9
      
       ("net/mlx5: Register devlink first under devlink lock")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-4-tariqt@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bf729988
    • Tariq Toukan's avatar
      net/mlx5: SD, Handle possible devcom ERR_PTR · aa4ac90d
      Tariq Toukan authored
      Check if devcom holds an error pointer and return immediately.
      
      This fixes Smatch static checker warning:
      drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c:221 sd_register()
      error: 'devcom' dereferencing possible ERR_PTR()
      
      Enhance mlx5_devcom_register_component() so it stops returning NULL,
      making it easier for its callers.
      
      Fixes: d3d05766
      
       ("net/mlx5: SD, Implement devcom communication and primary election")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Link: https://lore.kernel.org/all/f09666c8-e604-41f6-958b-4cc55c73faf9@gmail.com/T/
      
      
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-3-tariqt@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa4ac90d
    • Shay Drory's avatar
      net/mlx5: Lag, restore buckets number to default after hash LAG deactivation · 37cc10da
      Shay Drory authored
      The cited patch introduces the concept of buckets in LAG in hash mode.
      However, the patch doesn't clear the number of buckets in the LAG
      deactivation. This results in using the wrong number of buckets in
      case user create a hash mode LAG and afterwards create a non-hash
      mode LAG.
      
      Hence, restore buckets number to default after hash mode LAG
      deactivation.
      
      Fixes: 352899f3
      
       ("net/mlx5: Lag, use buckets in hash mode")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-2-tariqt@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      37cc10da
    • Asbjørn Sloth Tønnesen's avatar
      net: sparx5: flower: fix fragment flags handling · 68aba004
      Asbjørn Sloth Tønnesen authored
      I noticed that only 3 out of the 4 input bits were used,
      mt.key->flags & FLOW_DIS_IS_FRAGMENT was never checked.
      
      In order to avoid a complicated maze, I converted it to
      use a 16 byte mapping table.
      
      As shown in the table below the old heuristics doesn't
      always do the right thing, ie. when FLOW_DIS_IS_FRAGMENT=1/1
      then it used to only match follow-up fragment packets.
      
      Here are all the combinations, and their resulting new/old
      VCAP key/mask filter:
      
        /- FLOW_DIS_IS_FRAGMENT (key/mask)
        |    /- FLOW_DIS_FIRST_FRAG (key/mask)
        |    |    /-- new VCAP fragment (key/mask)
        v    v    v    v- old VCAP fragment (key/mask)
      
       0/0  0/0  -/-  -/-     impossible (due to entry cond. on mask)
       0/0  0/1  -/-  0/3 !!  invalid (can't match non-fragment + follow-up frag)
       0/0  1/0  -/-  -/-     impossible (key > mask)
       0/0  1/1  1/3  1/3     first fragment
      
       0/1  0/0  0/3  3/3 !!  not fragmented
       0/1  0/1  0/3  3/3 !!  not fragmented (+ not first fragment)
       0/1  1/0  -/-  -/-     impossible (key > mask)
       0/1  1/1  -/-  1/3 !!  invalid (non-fragment and first frag)
      
       1/0  0/0  -/-  -/-     impossible (key > mask)
       1/0  0/1  -/-  -/-     impossible (key > mask)
       1/0  1/0  -/-  -/-     impossible (key > mask)
       1/0  1/1  -/-  -/-     impossible (key > mask)
      
       1/1  0/0  1/1  3/3 !!  some fragment
       1/1  0/1  3/3  3/3     follow-up fragment
       1/1  1/0  -/-  -/-     impossible (key > mask)
       1/1  1/1  1/3  1/3     first fragment
      
      In the datasheet the VCAP fragment values are documented as:
       0 = no fragment
       1 = initial fragment
       2 = suspicious fragment
       3 = valid follow-up fragment
      
      Result: 3 combinations match the old behavior,
              3 combinations have been corrected,
              2 combinations are now invalid, and fail,
              8 combinations are impossible.
      
      It should now be aligned with how FLOW_DIS_IS_FRAGMENT
      and FLOW_DIS_FIRST_FRAG is set in __skb_flow_dissect() in
      net/core/flow_dissector.c
      
      Since the VCAP fragment values are not a bitfield, we have
      to ignore the suspicious fragment value, eg. when matching
      on any kind of fragment with FLOW_DIS_IS_FRAGMENT=1/1.
      
      Only compile tested, and logic tested in userspace, as I
      unfortunately don't have access to this switch chip (yet).
      
      Fixes: d6c2964d
      
       ("net: microchip: sparx5: Adding more tc flower keys for the IS2 VCAP")
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarSteen Hegelund <Steen.Hegelund@microchip.com>
      Tested-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240411111321.114095-1-ast@fiberby.net
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      68aba004
    • Jakub Kicinski's avatar
      Merge branch 'af_unix-fix-msg_oob-bugs-with-msg_peek' · 27f58f7f
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      af_unix: Fix MSG_OOB bugs with MSG_PEEK.
      
      Currently, OOB data can be read without MSG_OOB accidentally
      in two cases, and this seris fixes the bugs.
      
      v1: https://lore.kernel.org/netdev/20240409225209.58102-1-kuniyu@amazon.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240410171016.7621-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      27f58f7f
    • Kuniyuki Iwashima's avatar
      af_unix: Don't peek OOB data without MSG_OOB. · 22dd70eb
      Kuniyuki Iwashima authored
      Currently, we can read OOB data without MSG_OOB by using MSG_PEEK
      when OOB data is sitting on the front row, which is apparently
      wrong.
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        >>> c1.send(b'a', MSG_OOB)
        1
        >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
        b'a'
      
      If manage_oob() is called when no data has been copied, we only
      check if the socket enables SO_OOBINLINE or MSG_PEEK is not used.
      Otherwise, the skb is returned as is.
      
      However, here we should return NULL if MSG_PEEK is set and no data
      has been copied.
      
      Also, in such a case, we should not jump to the redo label because
      we will be caught in the loop and hog the CPU until normal data
      comes in.
      
      Then, we need to handle skb == NULL case with the if-clause below
      the manage_oob() block.
      
      With this patch:
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        >>> c1.send(b'a', MSG_OOB)
        1
        >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        BlockingIOError: [Errno 11] Resource temporarily unavailable
      
      Fixes: 314001f0
      
       ("af_unix: Add OOB support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240410171016.7621-3-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      22dd70eb
    • Kuniyuki Iwashima's avatar
      af_unix: Call manage_oob() for every skb in unix_stream_read_generic(). · 283454c8
      Kuniyuki Iwashima authored
      When we call recv() for AF_UNIX socket, we first peek one skb and
      calls manage_oob() to check if the skb is sent with MSG_OOB.
      
      However, when we fetch the next (and the following) skb, manage_oob()
      is not called now, leading a wrong behaviour.
      
      Let's say a socket send()s "hello" with MSG_OOB and the peer tries
      to recv() 5 bytes with MSG_PEEK.  Here, we should get only "hell"
      without 'o', but actually not:
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        >>> c1.send(b'hello', MSG_OOB)
        5
        >>> c2.recv(5, MSG_PEEK)
        b'hello'
      
      The first skb fills 4 bytes, and the next skb is peeked but not
      properly checked by manage_oob().
      
      Let's move up the again label to call manage_oob() for evry skb.
      
      With this patch:
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        >>> c1.send(b'hello', MSG_OOB)
        5
        >>> c2.recv(5, MSG_PEEK)
        b'hell'
      
      Fixes: 314001f0
      
       ("af_unix: Add OOB support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240410171016.7621-2-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      283454c8
  6. Apr 12, 2024
    • David S. Miller's avatar
      Merge tag 'nf-24-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 90be7a5c
      David S. Miller authored
      
      
      netfilter pull request 24-04-11
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      Patches #1 and #2 add missing rcu read side lock when iterating over
      expression and object type list which could race with module removal.
      
      Patch #3 prevents promisc packet from visiting the bridge/input hook
      	 to amend a recent fix to address conntrack confirmation race
      	 in br_netfilter and nf_conntrack_bridge.
      
      Patch #4 adds and uses iterate decorator type to fetch the current
      	 pipapo set backend datastructure view when netlink dumps the
      	 set elements.
      
      Patch #5 fixes removal of duplicate elements in the pipapo set backend.
      
      Patch #6 flowtable validates pppoe header before accessing it.
      
      Patch #7 fixes flowtable datapath for pppoe packets, otherwise lookup
               fails and pppoe packets follow classic path.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90be7a5c
    • Linus Torvalds's avatar
      Merge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 2ae9a897
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth.
      
        Current release - new code bugs:
      
         - netfilter: complete validation of user input
      
         - mlx5: disallow SRIOV switchdev mode when in multi-PF netdev
      
        Previous releases - regressions:
      
         - core: fix u64_stats_init() for lockdep when used repeatedly in one
           file
      
         - ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
      
         - bluetooth: fix memory leak in hci_req_sync_complete()
      
         - batman-adv: avoid infinite loop trying to resize local TT
      
         - drv: geneve: fix header validation in geneve[6]_xmit_skb
      
         - drv: bnxt_en: fix possible memory leak in
           bnxt_rdma_aux_device_init()
      
         - drv: mlx5: offset comp irq index in name by one
      
         - drv: ena: avoid double-free clearing stale tx_info->xdpf value
      
         - drv: pds_core: fix pdsc_check_pci_health deadlock
      
        Previous releases - always broken:
      
         - xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
      
         - bluetooth: fix setsockopt not validating user input
      
         - af_unix: clear stale u->oob_skb.
      
         - nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
      
         - drv: virtio_net: fix guest hangup on invalid RSS update
      
         - drv: mlx5e: Fix mlx5e_priv_init() cleanup flow
      
         - dsa: mt7530: trap link-local frames regardless of ST Port State"
      
      * tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
        net: ena: Set tx_info->xdpf value to NULL
        net: ena: Fix incorrect descriptor free behavior
        net: ena: Wrong missing IO completions check order
        net: ena: Fix potential sign extension issue
        af_unix: Fix garbage collector racing against connect()
        net: dsa: mt7530: trap link-local frames regardless of ST Port State
        Revert "s390/ism: fix receive message buffer allocation"
        net: sparx5: fix wrong config being used when reconfiguring PCS
        net/mlx5: fix possible stack overflows
        net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
        net/mlx5e: RSS, Block XOR hash with over 128 channels
        net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
        net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
        net/mlx5e: Fix mlx5e_priv_init() cleanup flow
        net/mlx5e: RSS, Block changing channels number when RXFH is configured
        net/mlx5: Correctly compare pkt reformat ids
        net/mlx5: Properly link new fs rules into the tree
        net/mlx5: offset comp irq index in name by one
        net/mlx5: Register devlink first under devlink lock
        net/mlx5: E-switch, store eswitch pointer before registering devlink_param
        ...
      2ae9a897
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ab4319fd
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "The most important fix is the sg one because the regression it fixes
        (spurious warning and use after final put) is already backported to
        stable.
      
        The next biggest impact is the target fix for wrong credentials used
        to load a module because it's affecting new kernels installed on
        selinux based distributions.
      
        The other three fixes are an obvious off by one and SATA protocol
        issues"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
        scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
        scsi: hisi_sas: Handle the NCQ error returned by D2H frame
        scsi: target: Fix SELinux error when systemd-modules loads the target module
        scsi: sg: Avoid race in error handling & drop bogus warn
      ab4319fd