Skip to content
  1. Nov 21, 2023
    • Willem de Bruijn's avatar
      selftests: net: verify fq per-band packet limit · a0bc96c0
      Willem de Bruijn authored
      
      
      Commit 29f834aa ("net_sched: sch_fq: add 3 bands and WRR
      scheduling") introduces multiple traffic bands, and per-band maximum
      packet count.
      
      Per-band limits ensures that packets in one class cannot fill the
      entire qdisc and so cause DoS to the traffic in the other classes.
      
      Verify this behavior:
        1. set the limit to 10 per band
        2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
        3. send 20 pkts on band A: verify that  0 are queued, 20 dropped
        4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
      
      Packets must remain queued for a period to trigger this behavior.
      Use SO_TXTIME to store packets for 100 msec.
      
      The test reuses existing upstream test infra. The script is a fork of
      cmsg_time.sh. The scripts call cmsg_sender.
      
      The test extends cmsg_sender with two arguments:
      
      * '-P' SO_PRIORITY
        There is a subtle difference between IPv4 and IPv6 stack behavior:
        PF_INET/IP_TOS        sets IP header bits and sk_priority
        PF_INET6/IPV6_TCLASS  sets IP header bits BUT NOT sk_priority
      
      * '-n' num pkts
        Send multiple packets in quick succession.
        I first attempted a for loop in the script, but this is too slow in
        virtualized environments, causing flakiness as the 100ms timeout is
        reached and packets are dequeued.
      
      Also do not wait for timestamps to be queued unless timestamps are
      requested.
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20231116203449.2627525-1-willemdebruijn.kernel@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a0bc96c0
    • Vishvambar Panth S's avatar
      net: microchip: lan743x : bidirectional throughput improvement · 45933b2d
      Vishvambar Panth S authored
      
      
      The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
      device supports placing the descriptors in memory back to back or
      reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
      configurable hardware setting. Currently DSPACE is unnecessarily set to
      match the host's L1 cache line size, resulting in space reserved in
      between descriptors in most platforms and causing a suboptimal behavior
      (single PCIe Mem transaction per descriptor). By changing the setting
      to DSPACE=16 many descriptors can be packed in a single PCIe Mem
      transaction resulting in a massive performance improvement in
      bidirectional tests without any negative effects.
      Tested and verified improvements on x64 PC and several ARM platforms
      (typical data below)
      
      Test setup 1: x64 PC with LAN7430 ---> x64 PC
      
      iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID][Role] Interval           Transfer     Bitrate
      [  5][TX-C]   0.00-10.00  sec   170 MBytes   143 Mbits/sec  sender
      [  5][TX-C]   0.00-10.04  sec   169 MBytes   141 Mbits/sec  receiver
      [  7][RX-C]   0.00-10.00  sec  1.02 GBytes   876 Mbits/sec  sender
      [  7][RX-C]   0.00-10.04  sec  1.02 GBytes   870 Mbits/sec  receiver
      
      iperf3 UDP bidirectional with DSPACE set to 16 Bytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID][Role] Interval           Transfer     Bitrate
      [  5][TX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec  sender
      [  5][TX-C]   0.00-10.04  sec  1.11 GBytes   951 Mbits/sec  receiver
      [  7][RX-C]   0.00-10.00  sec  1.10 GBytes   948 Mbits/sec  sender
      [  7][RX-C]   0.00-10.04  sec  1.10 GBytes   942 Mbits/sec  receiver
      
      Test setup 2 : RK3399 with LAN7430 ---> x64 PC
      
      RK3399 Spec:
      The SOM-RK3399 is ARM module designed and developed by FriendlyElec.
      Cores: 64-bit Dual Core Cortex-A72 + Quad Core Cortex-A53
      Frequency: Cortex-A72(up to 2.0GHz), Cortex-A53(up to 1.5GHz)
      PCIe: PCIe x4, compatible with PCIe 2.1, Dual operation mode
      
      iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID][Role] Interval           Transfer     Bitrate
      [  5][TX-C]   0.00-10.00  sec   534 MBytes   448 Mbits/sec  sender
      [  5][TX-C]   0.00-10.05  sec   534 MBytes   446 Mbits/sec  receiver
      [  7][RX-C]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  sender
      [  7][RX-C]   0.00-10.05  sec  1.11 GBytes   946 Mbits/sec  receiver
      
      iperf3 UDP bidirectional with DSPACE set to 16 Bytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID][Role] Interval           Transfer     Bitrate
      [  5][TX-C]   0.00-10.00  sec   966 MBytes   810 Mbits/sec   sender
      [  5][TX-C]   0.00-10.04  sec   965 MBytes   806 Mbits/sec   receiver
      [  7][RX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec   sender
      [  7][RX-C]   0.00-10.04  sec  1.07 GBytes   919 Mbits/sec   receiver
      
      Signed-off-by: default avatarVishvambar Panth S <vishvambarpanth.s@microchip.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20231116054350.620420-1-vishvambarpanth.s@microchip.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45933b2d
  2. Nov 20, 2023
  3. Nov 19, 2023
  4. Nov 18, 2023
    • David S. Miller's avatar
      Merge branch 'ncsi-mac-address-command' · 4dce97b1
      David S. Miller authored
      Patrick Williams says:
      
      ====================
      net/ncsi: Add NC-SI 1.2 Get MC MAC Address command
      
      NC-SI 1.2 has now been published[1] and adds a new command for "Get MC
      MAC Address".  This is often used by BMCs to get the assigned MAC
      address for the channel used by the BMC.
      
      This change set has been tested on a Broadcomm 200G NIC with updated
      firmware for NC-SI 1.2 and at least one other non-public NIC design.
      
      1. https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.2.0.pdf
      
      
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4dce97b1
    • Peter Delevoryas's avatar
      net/ncsi: Add NC-SI 1.2 Get MC MAC Address command · b8291cf3
      Peter Delevoryas authored
      This change adds support for the NC-SI 1.2 Get MC MAC Address command,
      specified here:
      
      https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.2.0.pdf
      
      
      
      It serves the exact same function as the existing OEM Get MAC Address
      commands, so if a channel reports that it supports NC-SI 1.2, we prefer
      to use the standard command rather than the OEM command.
      
      Verified with an invalid MAC address and 2 valid ones:
      
      [   55.137072] ftgmac100 1e690000.ftgmac eth0: NCSI: Received 3 provisioned MAC addresses
      [   55.137614] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 0: 00:00:00:00:00:00
      [   55.138026] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 1: fa:ce:b0:0c:20:22
      [   55.138528] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 2: fa:ce:b0:0c:20:23
      [   55.139241] ftgmac100 1e690000.ftgmac eth0: NCSI: Unable to assign 00:00:00:00:00:00 to device
      [   55.140098] ftgmac100 1e690000.ftgmac eth0: NCSI: Set MAC address to fa:ce:b0:0c:20:22
      
      Signed-off-by: default avatarPeter Delevoryas <peter@pjd.dev>
      Signed-off-by: default avatarPatrick Williams <patrick@stwcx.xyz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8291cf3
    • Peter Delevoryas's avatar
      net/ncsi: Fix netlink major/minor version numbers · 3084b58b
      Peter Delevoryas authored
      The netlink interface for major and minor version numbers doesn't actually
      return the major and minor version numbers.
      
      It reports a u32 that contains the (major, minor, update, alpha1)
      components as the major version number, and then alpha2 as the minor
      version number.
      
      For whatever reason, the u32 byte order was reversed (ntohl): maybe it was
      assumed that the encoded value was a single big-endian u32, and alpha2 was
      the minor version.
      
      The correct way to get the supported NC-SI version from the network
      controller is to parse the Get Version ID response as described in 8.4.44
      of the NC-SI spec[1].
      
          Get Version ID Response Packet Format
      
                    Bits
                  +--------+--------+--------+--------+
           Bytes  | 31..24 | 23..16 | 15..8  | 7..0   |
          +-------+--------+--------+--------+--------+
          | 0..15 | NC-SI Header                      |
          +-------+--------+--------+--------+--------+
          | 16..19| Response code   | Reason code     |
          +-------+--------+--------+--------+--------+
          |20..23 | Major  | Minor  | Update | Alpha1 |
          +-------+--------+--------+--------+--------+
          |24..27 |         reserved         | Alpha2 |
          +-------+--------+--------+--------+--------+
          |            .... other stuff ....          |
      
      The major, minor, and update fields are all binary-coded decimal (BCD)
      encoded [2]. The spec provides examples below the Get Version ID response
      format in section 8.4.44.1, but for practical purposes, this is an example
      from a live network card:
      
          root@bmc:~# ncsi-util 0x15
          NC-SI Command Response:
          cmd: GET_VERSION_ID(0x15)
          Response: COMMAND_COMPLETED(0x0000)  Reason: NO_ERROR(0x0000)
          Payload length = 40
      
          20: 0xf1 0xf1 0xf0 0x00 <<<<<<<<< (major, minor, update, alpha1)
          24: 0x00 0x00 0x00 0x00 <<<<<<<<< (_, _, _, alpha2)
      
          28: 0x6d 0x6c 0x78 0x30
          32: 0x2e 0x31 0x00 0x00
          36: 0x00 0x00 0x00 0x00
          40: 0x16 0x1d 0x07 0xd2
          44: 0x10 0x1d 0x15 0xb3
          48: 0x00 0x17 0x15 0xb3
          52: 0x00 0x00 0x81 0x19
      
      This should be parsed as "1.1.0".
      
      "f" in the upper-nibble means to ignore it, contributing zero.
      
      If both nibbles are "f", I think the whole field is supposed to be ignored.
      Major and minor are "required", meaning they're not supposed to be "ff",
      but the update field is "optional" so I think it can be ff. I think the
      simplest thing to do is just set the major and minor to zero instead of
      juggling some conditional logic or something.
      
      bcd2bin() from "include/linux/bcd.h" seems to assume both nibbles are 0-9,
      so I've provided a custom BCD decoding function.
      
      Alpha1 and alpha2 are ISO/IEC 8859-1 encoded, which just means ASCII
      characters as far as I can tell, although the full encoding table for
      non-alphabetic characters is slightly different (I think).
      
      I imagine the alpha fields are just supposed to be alphabetic characters,
      but I haven't seen any network cards actually report a non-zero value for
      either.
      
      If people wrote software against this netlink behavior, and were parsing
      the major and minor versions themselves from the u32, then this would
      definitely break their code.
      
      [1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.0.0.pdf
      [2] https://en.wikipedia.org/wiki/Binary-coded_decimal
      [2] https://en.wikipedia.org/wiki/ISO/IEC_8859-1
      
      
      
      Signed-off-by: default avatarPeter Delevoryas <peter@pjd.dev>
      Fixes: 138635cc ("net/ncsi: NCSI response packet handler")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3084b58b
    • Peter Delevoryas's avatar
      net/ncsi: Simplify Kconfig/dts control flow · c797ce16
      Peter Delevoryas authored
      Background:
      
      1. CONFIG_NCSI_OEM_CMD_KEEP_PHY
      
      If this is enabled, we send an extra OEM Intel command in the probe
      sequence immediately after discovering a channel (e.g. after "Clear
      Initial State").
      
      2. CONFIG_NCSI_OEM_CMD_GET_MAC
      
      If this is enabled, we send one of 3 OEM "Get MAC Address" commands from
      Broadcom, Mellanox (Nvidida), and Intel in the *configuration* sequence
      for a channel.
      
      3. mellanox,multi-host (or mlx,multi-host)
      
      Introduced by this patch:
      
      https://lore.kernel.org/all/20200108234341.2590674-1-vijaykhemka@fb.com/
      
      Which was actually originally from cosmo.chou@quantatw.com:
      
      https://github.com/facebook/openbmc-linux/commit/9f132a10ec48db84613519258cd8a317fb9c8f1b
      
      
      
      Cosmo claimed that the Nvidia ConnectX-4 and ConnectX-6 NIC's don't
      respond to Get Version ID, et. al in the probe sequence unless you send
      the Set MC Affinity command first.
      
      Problem Statement:
      
      We've been using a combination of #ifdef code blocks and IS_ENABLED()
      conditions to conditionally send these OEM commands.
      
      It makes adding any new code around these commands hard to understand.
      
      Solution:
      
      In this patch, I just want to remove the conditionally compiled blocks
      of code, and always use IS_ENABLED(...) to do dynamic control flow.
      
      I don't think the small amount of code this adds to non-users of the OEM
      Kconfigs is a big deal.
      
      Signed-off-by: default avatarPeter Delevoryas <peter@pjd.dev>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c797ce16
    • David S. Miller's avatar
      Merge branch 'net-make-timestamping-selectable' · f9672265
      David S. Miller authored
      Kory Maincent says:
      
      ====================
      net: Make timestamping selectable
      
      Up until now, there was no way to let the user select the layer at
      which time stamping occurs. The stack assumed that PHY time stamping
      is always preferred, but some MAC/PHY combinations were buggy.
      
      This series updates the default MAC/PHY default timestamping and aims to
      allow the user to select the desired layer administratively.
      
      Changes in v2:
      - Move selected_timestamping_layer variable of the concerned patch.
      - Use sysfs_streq instead of strmcmp.
      - Use the PHY timestamp only if available.
      
      Changes in v3:
      - Expose the PTP choice to ethtool instead of sysfs.
        You can test it with the ethtool source on branch feature_ptp of:
        https://github.com/kmaincent/ethtool
      - Added a devicetree binding to select the preferred timestamp.
      
      Changes in v4:
      - Move on to ethtool netlink instead of ioctl.
      - Add a netdev notifier to allow packet trapping by the MAC in case of PHY
        time stamping.
      - Add a PHY whitelist to not break the old PHY default time-stamping
        preference API.
      
      Changes in v5:
      - Update to ndo_hwstamp_get/set. This bring several new patches.
      - Add few patches to make the glue.
      - Convert macb to ndo_hwstamp_get/set.
      - Add netlink specs description of new ethtool commands.
      - Removed netdev notifier.
      - Split the patches that expose the timestamping to userspace to separate
        the core and ethtool development.
      - Add description of software timestamping.
      - Convert PHYs hwtstamp callback to use kernel_hwtstamp_config.
      
      Changes in v6:
      - Few fixes from the reviews.
      - Replace the allowlist to default_timestamp flag to know which phy is
        using old API behavior.
      - Rename the timestamping layer enum values.
      - Move to a simple enum instead of the mix between enum and bitfield.
      - Update ts_info and ts-set in software timestamping case.
      
      Changes in v7:
      - Fix a temporary build error.
      - Link to v6: https://lore.kernel.org/r/20231019-feature_ptp_netnext-v6-0-71affc27b0e5@bootlin.com
      
      
      ====================
      
      Signed-off-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9672265
    • Kory Maincent's avatar
      netlink: specs: Introduce time stamping set command · ee60ea6b
      Kory Maincent authored
      
      
      Add a new commands allowing to set the time stamping.
      
      Example usage :
      ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema \
      	     --do ts-list-get \
      	     --json '{"header":{"dev-name":"eth0"}}'
      {'header': {'dev-index': 3, 'dev-name': 'eth0'},
       'ts-list-layer': b'\x02\x00\x00\x00\x01\x00\x00\x00\x05\x00\x00\x00'}
      
      ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do ts-set \
      	     --json '{"header":{"dev-name":"eth0"}, "ts-layer":5}'
      none
      
      ./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do ts-get \
      	     --json '{"header":{"dev-name":"eth0"}}'
      {'header': {'dev-index': 3, 'dev-name': 'eth0'}, 'ts-layer': 5}
      
      Signed-off-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee60ea6b
    • Kory Maincent's avatar
      net: ethtool: ts: Let the active time stamping layer be selectable · 152c75e1
      Kory Maincent authored
      
      
      Now that the current timestamp is saved in a variable lets add the
      ETHTOOL_MSG_TS_SET ethtool netlink socket to make it selectable.
      
      Signed-off-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      152c75e1