Skip to content
  1. Nov 01, 2021
    • Pablo Neira Ayuso's avatar
      netfilter: nft_meta: add NFT_META_IFTYPE · 56fa9501
      Pablo Neira Ayuso authored
      
      
      Generalize NFT_META_IIFTYPE to NFT_META_IFTYPE which allows you to match
      on the interface type of the skb->dev field. This field is used by the
      netdev family to add an implicit dependency to skip non-ethernet packets
      when matching on layer 3 and 4 TCP/IP header fields.
      
      For backward compatibility, add the NFT_META_IIFTYPE alias to
      NFT_META_IFTYPE.
      
      Add __NFT_META_IIFTYPE, to be used by userspace in the future to match
      specifically on the iiftype.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      56fa9501
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: set on IPS_ASSURED if flows enters internal stream state · b7b1d02f
      Pablo Neira Ayuso authored
      
      
      The internal stream state sets the timeout to 120 seconds 2 seconds
      after the creation of the flow, attach this internal stream state to the
      IPS_ASSURED flag for consistent event reporting.
      
      Before this patch:
      
            [NEW] udp      17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 [UNREPLIED] src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282
         [UPDATE] udp      17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282
         [UPDATE] udp      17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED]
        [DESTROY] udp      17 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED]
      
      Note IPS_ASSURED for the flow not yet in the internal stream state.
      
      after this update:
      
            [NEW] udp      17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 [UNREPLIED] src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282
         [UPDATE] udp      17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282
         [UPDATE] udp      17 120 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED]
        [DESTROY] udp      17 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED]
      
      Before this patch, short-lived UDP flows never entered IPS_ASSURED, so
      they were already candidate flow to be deleted by early_drop under
      stress.
      
      Before this patch, IPS_ASSURED is set on regardless the internal stream
      state, attach this internal stream state to IPS_ASSURED.
      
      packet #1 (original direction) enters NEW state
      packet #2 (reply direction) enters ESTABLISHED state, sets on IPS_SEEN_REPLY
      paclet #3 (any direction) sets on IPS_ASSURED (if 2 seconds since the
                creation has passed by).
      
      Reported-by: default avatarMaciej Żenczykowski <zenczykowski@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b7b1d02f
  2. Oct 22, 2021
  3. Oct 21, 2021
    • luo penghao's avatar
      net/core: Remove unused assignment operations and variable · 50af5969
      luo penghao authored
      
      
      Although if_info_size is assigned, it has not been used. And the variable
      should also be deleted.
      
      The clang_analyzer complains as follows:
      
      net/core/rtnetlink.c:3806: warning:
      
      Although the value stored to 'if_info_size' is used in the enclosing
      expression, the value is never actually read from 'if_info_size'.
      
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarluo penghao <luo.penghao@zte.com.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50af5969
    • Sebastian Andrzej Siewior's avatar
      net: stats: Read the statistics in ___gnet_stats_copy_basic() instead of adding. · c5c6e589
      Sebastian Andrzej Siewior authored
      
      
      Since the rework, the statistics code always adds up the byte and packet
      value(s). On 32bit architectures a seqcount_t is used in
      gnet_stats_basic_sync to ensure that the 64bit values are not modified
      during the read since two 32bit loads are required. The usage of a
      seqcount_t requires a lock to ensure that only one writer is active at a
      time. This lock leads to disabled preemption during the update.
      
      The lack of disabling preemption is now creating a warning as reported
      by Naresh since the query done by gnet_stats_copy_basic() is in
      preemptible context.
      
      For ___gnet_stats_copy_basic() there is no need to disable preemption
      since the update is performed on stack and can't be modified by another
      writer. Instead of disabling preemption, to avoid the warning,
      simply create a read function to just read the values and return as u64.
      
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Fixes: 67c9e627 ("net: sched: Protect Qdisc::bstats with u64_stats")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5c6e589
    • David S. Miller's avatar
      Merge branch 'dsa_to_port-loops' · ce272973
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Remove the "dsa_to_port in a loop" antipattern
      
      v1->v2: more patches
      v2->v3: less patches
      
      As opposed to previous series, I would now like to first refactor the
      DSA core, since that sees fewer patches than drivers, and make the
      helpers available. Since the refactoring is fairly noisy, I don't want
      to force it on driver maintainers right away, patches can be submitted
      independently.
      
      The original cover letter is below:
      
      The DSA core and drivers currently iterate too much through the port
      list of a switch. For example, this snippet:
      
      	for (port = 0; port < ds->num_ports; port++) {
      		if (!dsa_is_cpu_port(ds, port))
      			continue;
      
      		ds->ops->change_tag_protocol(ds, port, tag_ops->proto);
      	}
      
      iterates through ds->num_ports once, and then calls dsa_is_cpu_port to
      filter out the other types of ports. But that function has a hidden call
      to dsa_to_port() in it, which contains:
      
      	list_for_each_entry(dp, &dst->ports, list)
      		if (dp->ds == ds && dp->index == p)
      			return dp;
      
      where the only thing we wanted to know in the first place was whether
      dp->type == DSA_PORT_TYPE_CPU or not.
      
      So it seems that the problem is that we are not iterating with the right
      variable. We have an "int port" but in fact need a "struct dsa_port *dp".
      
      This has started being an issue since this patch series:
      https://patchwork.ozlabs.org/project/netdev/cover/20191020031941.3805884-1-vivien.didelot@gmail.com/
      
      
      
      The currently proposed set of changes iterates like this:
      
      	dsa_switch_for_each_cpu_port(cpu_dp, ds)
      		err = ds->ops->change_tag_protocol(ds, cpu_dp->index,
      						   tag_ops->proto);
      
      which iterates directly over ds->dst->ports, which is a list of struct
      dsa_port *dp. This makes it much easier and more efficient to check
      dp->type.
      
      As a nice side effect, with the proposed driver API, driver writers are
      now encouraged to use more efficient patterns, and not only due to less
      iterations through the port list. For example, something like this:
      
      	for (port = 0; port < ds->num_ports; port++)
      		do_something();
      
      probably does not need to do_something() for the ports that are disabled
      in the device tree. But adding extra code for that would look like this:
      
      	for (port = 0; port < ds->num_ports; port++) {
      		if (!dsa_is_unused_port(ds, port))
      			continue;
      
      		do_something();
      	}
      
      and therefore, it is understandable that some driver writers may decide
      to not bother. This patch series introduces a "dsa_switch_for_each_available_port"
      macro which comes at no extra cost in terms of lines of code / number of
      braces to the driver writer, but it has the "dsa_is_unused_port" check
      embedded within it.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce272973
    • Vladimir Oltean's avatar
      net: dsa: tag_8021q: make dsa_8021q_{rx,tx}_vid take dp as argument · 992e5cc7
      Vladimir Oltean authored
      
      
      Pass a single argument to dsa_8021q_rx_vid and dsa_8021q_tx_vid that
      contains the necessary information from the two arguments that are
      currently provided: the switch and the port number.
      
      Also rename those functions so that they have a dsa_port_* prefix, since
      they operate on a struct dsa_port *.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      992e5cc7
    • Vladimir Oltean's avatar
      net: dsa: tag_sja1105: do not open-code dsa_switch_for_each_port · 5068887a
      Vladimir Oltean authored
      
      
      Find the remaining iterators over dst->ports that only filter for the
      ports belonging to a certain switch, and replace those with the
      dsa_switch_for_each_port helper that we have now.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5068887a
    • Vladimir Oltean's avatar
      net: dsa: convert cross-chip notifiers to iterate using dp · fac6abd5
      Vladimir Oltean authored
      
      
      The majority of cross-chip switch notifiers need to filter in some way
      over the type of ports: some install VLANs etc on all cascade ports.
      
      The difference is that the matching function, which filters by port
      type, is separate from the function where the iteration happens. So this
      patch needs to refactor the matching functions' prototypes as well, to
      take the dp as argument.
      
      In a future patch/series, I might convert dsa_towards_port to return a
      struct dsa_port *dp too, but at the moment it is a bit entangled with
      dsa_routing_port which is also used by mv88e6xxx and they both return an
      int port. So keep dsa_towards_port the way it is and convert it into a
      dp using dsa_to_port.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fac6abd5
    • Vladimir Oltean's avatar
      net: dsa: remove gratuitous use of dsa_is_{user,dsa,cpu}_port · 57d77986
      Vladimir Oltean authored
      
      
      Find the occurrences of dsa_is_{user,dsa,cpu}_port where a struct
      dsa_port *dp was already available in the function scope, and replace
      them with the dsa_port_is_{user,dsa,cpu} equivalent function which uses
      that dp directly and does not perform another hidden dsa_to_port().
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57d77986
    • Vladimir Oltean's avatar
      net: dsa: do not open-code dsa_switch_for_each_port · 65c563a6
      Vladimir Oltean authored
      
      
      Find the remaining iterators over dst->ports that only filter for the
      ports belonging to a certain switch, and replace those with the
      dsa_switch_for_each_port helper that we have now.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65c563a6
    • Vladimir Oltean's avatar
      net: dsa: remove the "dsa_to_port in a loop" antipattern from the core · d0004a02
      Vladimir Oltean authored
      
      
      Ever since Vivien's conversion of the ds->ports array into a dst->ports
      list, and the introduction of dsa_to_port, iterations through the ports
      of a switch became quadratic whenever dsa_to_port was needed.
      
      dsa_to_port can either be called directly, or indirectly through the
      dsa_is_{user,cpu,dsa,unused}_port helpers.
      
      Use the newly introduced dsa_switch_for_each_port() iteration macro
      that works with the iterator variable being a struct dsa_port *dp
      directly, and not an int i. It is an expensive variable to go from i to
      dp, but cheap to go from dp to i.
      
      This macro iterates through the entire ds->dst->ports list and filters
      by the ports belonging just to the switch provided as argument.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0004a02
    • Vladimir Oltean's avatar
      net: dsa: introduce helpers for iterating through ports using dp · 82b31898
      Vladimir Oltean authored
      
      
      Since the DSA conversion from the ds->ports array into the dst->ports
      list, the DSA API has encouraged driver writers, as well as the core
      itself, to write inefficient code.
      
      Currently, code that wants to filter by a specific type of port when
      iterating, like {!unused, user, cpu, dsa}, uses the dsa_is_*_port helper.
      Under the hood, this uses dsa_to_port which iterates again through
      dst->ports. But the driver iterates through the port list already, so
      the complexity is quadratic for the typical case of a single-switch
      tree.
      
      This patch introduces some iteration helpers where the iterator is
      already a struct dsa_port *dp, so that the other variant of the
      filtering functions, dsa_port_is_{unused,user,cpu_dsa}, can be used
      directly on the iterator. This eliminates the second lookup.
      
      These functions can be used both by the core and by drivers.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82b31898
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · dedb0809
      David S. Miller authored
      
      
      Tony Nguyen says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2021-10-20
      
      Sudheer Mogilappagari says:
      
      This series introduces initial support for Application Device Queues(ADQ)
      in ice driver. ADQ provides traffic isolation for application flows in
      hardware and ability to steer traffic to a given traffic class. This
      helps in aligning NIC queues to application threads.
      
      Traffic classes are configured using mqprio framework of tc command
      and mapped to HW channels(VSIs) in the driver. The queue set of each
      traffic class is managed by corresponding VSI. Each traffic channel
      can be configured with bandwidth rate-limiting limits and is offloaded
      to the hardware through the mqprio framework by specifying the mode
      option as 'channel' and shaper option as 'bw_rlimit'.
      
      Next, the flows of application can be steered into a given traffic class
      using "tc filter" command. The option "skip_sw hw_tc x" indicates
      hw-offload of filtering and steering filtered traffic into specified TC.
      Non-matching traffic flows through TC0.
      
      When channel configuration are removed queue configuration is set to
      default and filters configured on individual traffic classes are deleted.
      
      example:
      $ ethtool -K eth0 hw-tc-offload on
      
      Configure 3 traffic classes and map priority 0,1,2 to TC0, TC1 and TC2
      respectively. TC0 has 2 queues from offset 0 & TC1 has 8 queues from
      offset 2 and TC2 has 4 queues from offset 10. Enable hardware offload
      of channels.
      
      $ tc qdisc add dev eth0 root mqprio num_tc 3 map 0 1 2 queues \
              2@0 8@2 4@10 hw 1 mode channel
      
      $ tc qdisc show dev eth0
      qdisc mqprio 8001: root  tc 2 map 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0
                   queues:(0:1) (2:9) (10:13)
                   mode:channel
      
      Configure two filters to match based on dst ipaddr, dst tcp port and
      redirect to TC1 and TC2.
      $ tc qdisc add dev eth0 clsact
      
      $ tc filter add dev eth0 protocol ip ingress prio 1 flower\
        dst_ip 192.168.1.1/32 ip_proto tcp dst_port 80\
        skip_sw hw_tc 1
      $ tc filter add dev eth0 protocol ip ingress prio 1 flower\
        dst_ip 192.168.1.1/32 ip_proto tcp dst_port 5001\
        skip_sw hw_tc 2
      
      $ tc filter show dev eth0 ingress
      
      Delete traffic classes configuration:
      $ sudo tc qdisc del dev eth0 root
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dedb0809
    • David S. Miller's avatar
    • Vladimir Oltean's avatar
      net: mscc: ocelot: track the port pvid using a pointer · d4004422
      Vladimir Oltean authored
      
      
      Now that we have a list of struct ocelot_bridge_vlan entries, we can
      rewrite the pvid logic to simply point to one of those structures,
      instead of having a separate structure with a "bool valid".
      The NULL pointer will represent the lack of a bridge pvid (not to be
      confused with the lack of a hardware pvid on the port, that is present
      at all times).
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4004422
    • Vladimir Oltean's avatar
      net: mscc: ocelot: add the local station MAC addresses in VID 0 · bfbab310
      Vladimir Oltean authored
      
      
      The ocelot switchdev driver does not include the CPU port in the list of
      flooding destinations for unknown traffic, instead that traffic is
      supposed to match FDB entries to reach the CPU.
      
      The addresses it installs are:
      (a) the station MAC address, in ocelot_probe_port() and later during
          runtime in ocelot_port_set_mac_address(). These are the VLAN-unaware
          addresses. The VLAN-aware addresses are in ocelot_vlan_vid_add().
      (b) multicast addresses added with dev_mc_add() (not bridge host MDB
          entries) in ocelot_mc_sync()
      (c) multicast destination MAC addresses for MRP in ocelot_mrp_save_mac(),
          to make sure those are dropped (not forwarded) by the bridging
          service, just trapped to the CPU
      
      So we can see that the logic is slightly buggy ever since the initial
      commit a556c76a ("net: mscc: Add initial Ocelot switch support").
      This is because, when ocelot_probe_port() runs, the port pvid is 0.
      Then we join a VLAN-aware bridge, the pvid becomes 1, we call
      ocelot_port_set_mac_address(), this learns the new MAC address in VID 1
      (also fails to forget the old one, since it thinks it's in VID 1, but
      that's not so important). Then when we leave the VLAN-aware bridge,
      outside world is unable to ping our new MAC address because it isn't
      learned in VID 0, the VLAN-unaware pvid.
      
      [ note: this is strictly based on static analysis, I don't have hardware
        to test. But there are also many more corner cases ]
      
      The basic idea is that we should have a separation of concerns, and the
      FDB entries used for standalone operation should be managed by the
      driver, and the FDB entries used by the bridging service should be
      managed by the bridge. So the standalone and VLAN-unaware bridge FDB
      entries should not follow the bridge PVID, because that will only be
      active when the bridge is VLAN-aware. So since the port pvid is
      coincidentally zero during probe time, just make those entries
      statically go to VID 0.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfbab310
    • Vladimir Oltean's avatar
      net: mscc: ocelot: allow a config where all bridge VLANs are egress-untagged · 0da1a1c4
      Vladimir Oltean authored
      
      
      At present, the ocelot driver accepts a single egress-untagged bridge
      VLAN, meaning that this sequence of operations:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp0 master br0
      bridge vlan add dev swp0 vid 2 pvid untagged
      
      fails because the bridge automatically installs VID 1 as a pvid & untagged
      VLAN, and vid 2 would be the second untagged VLAN on this port. It is
      necessary to delete VID 1 before proceeding to add VID 2.
      
      This limitation comes from the fact that we operate the port tag, when
      it has an egress-untagged VID, in the OCELOT_PORT_TAG_NATIVE mode.
      The ocelot switches do not have full flexibility and can either have one
      single VID as egress-untagged, or all of them.
      
      There are use cases for having all VLANs as egress-untagged as well, and
      this patch adds support for that.
      
      The change rewrites ocelot_port_set_native_vlan() into a more generic
      ocelot_port_manage_port_tag() function. Because the software bridge's
      state, transmitted to us via switchdev, can become very complex, we
      don't attempt to track all possible state transitions, but instead take
      a more declarative approach and just make ocelot_port_manage_port_tag()
      figure out which more to operate in:
      
      - port is VLAN-unaware: the classified VLAN (internal, unrelated to the
                              802.1Q header) is not inserted into packets on egress
      - port is VLAN-aware:
        - port has tagged VLANs:
          -> port has no untagged VLAN: set up as pure trunk
          -> port has one untagged VLAN: set up as trunk port + native VLAN
          -> port has more than one untagged VLAN: this is an invalid config
             which is rejected by ocelot_vlan_prepare
        - port has no tagged VLANs
          -> set up as pure egress-untagged port
      
      We don't keep the number of tagged and untagged VLANs, we just count the
      structures we keep.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0da1a1c4
    • Vladimir Oltean's avatar
      net: mscc: ocelot: convert the VLAN masks to a list · 90e0aa8d
      Vladimir Oltean authored
      
      
      First and foremost, the driver currently allocates a constant sized
      4K * u32 (16KB memory) array for the VLAN masks. However, a typical
      application might not need so many VLANs, so if we dynamically allocate
      the memory as needed, we might actually save some space.
      
      Secondly, we'll need to keep more advanced bookkeeping of the VLANs we
      have, notably we'll have to check how many untagged and how many tagged
      VLANs we have. This will have to stay in a structure, and allocating
      another 16 KB array for that is again a bit too much.
      
      So refactor the bridge VLANs in a linked list of structures.
      
      The hook points inside the driver are ocelot_vlan_member_add() and
      ocelot_vlan_member_del(), which previously used to operate on the
      ocelot->vlan_mask[vid] array element.
      
      ocelot_vlan_member_add() and ocelot_vlan_member_del() used to call
      ocelot_vlan_member_set() to commit to the ocelot->vlan_mask.
      Additionally, we had two calls to ocelot_vlan_member_set() from outside
      those callers, and those were directly from ocelot_vlan_init().
      Those calls do not set up bridging service VLANs, instead they:
      
      - clear the VLAN table on reset
      - set the port pvid to the value used by this driver for VLAN-unaware
        standalone port operation (VID 0)
      
      So now, when we have a structure which represents actual bridge VLANs,
      VID 0 doesn't belong in that structure, since it is not part of the
      bridging layer.
      
      So delete the middle man, ocelot_vlan_member_set(), and let
      ocelot_vlan_init() call directly ocelot_vlant_set_mask() which forgoes
      any data structure and writes directly to hardware, which is all that we
      need.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90e0aa8d
    • Vladimir Oltean's avatar
      net: mscc: ocelot: add a type definition for REW_TAG_CFG_TAG_CFG · 62a22bcb
      Vladimir Oltean authored
      
      
      This is a cosmetic patch which clarifies what are the port tagging
      options for Ocelot switches.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62a22bcb
    • Kiran Patil's avatar
      ice: Add tc-flower filter support for channel · 9fea7498
      Kiran Patil authored
      
      
      Add support to add/delete channel specific filter using tc-flower.
      For now, only supported action is "skip_sw hw_tc <tc_num>"
      
      Filter criteria is specific to channel and it can be
      combination of L3, L3+L4, L2+L4.
      
      Example:
      MATCH criteria       Action
      ---------------------------
      src and/or dest IPv4[6]/mask -> Forward to "hw_tc <tc_num>"
      dest IPv4[6]/mask + dest L4 port -> Forward to "hw_tc <tc_num>"
      dest MAC + dest L4 port -> Forward to "hw_tc <tc_num>"
      src IPv4[6]/mask + src L4 port -> Forward to "hw_tc <tc_num>"
      src MAC + src L4 port -> Forward to "hw_tc <tc_num>"
      
      Adding tc-flower filter for channel using "hw_tc"
      -------------------------------------------------
      tc qdisc add dev <ethX> clsact
      
      Above two steps are only needed the first time when adding
      tc-flower filter.
      
      tc filter add dev <ethX> protocol ip ingress prio 1 flower \
           dst_ip 192.168.0.1/32 ip_proto tcp dst_port 5001 \
           skip_sw hw_tc 1
      
      tc filter show dev <ethX> ingress
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1 hw_tc 1
        eth_type ipv4
        ip_proto tcp
        dst_ip 192.168.0.1
        dst_port 5001
        skip_sw
        in_hw in_hw_count 1
      
      Delete specific filter:
      -------------------------
      tc filter del  dev <ethx> ingress pref 1 handle 0x1 flower
      
      Delete All filters:
      ------------------
      tc filter del dev <ethX> ingress
      
      Co-developed-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: default avatarKiran Patil <kiran.patil@intel.com>
      Signed-off-by: default avatarSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Tested-by: default avatarBharathi Sreenivas <bharathi.sreenivas@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9fea7498
    • Kiran Patil's avatar
      ice: enable ndo_setup_tc support for mqprio_qdisc · fbc7b27a
      Kiran Patil authored
      
      
      Add support in driver for TC_QDISC_SETUP_MQPRIO. This support
      enables instantiation of channels in HW using existing MQPRIO
      infrastructure which is extended to be offloadable. This
      provides a mechanism to configure dedicated set of queues for
      each TC.
      
      Configuring channels using "tc mqprio":
      --------------------------------------
      tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \
      	queues 4@0 4@4 4@8  hw 1 mode channel
      
      Above command configures 3 TCs having 4 queues each. "hw 1 mode channel"
      implies offload of channel configuration to HW. When driver processes
      configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each
      TC maps to HW VSI with specified queues.
      
      User can optionally specify bandwidth min and max rate limit per TC
      (see example below). If shaper params like min and/or max bandwidth
      rate limit are specified, driver configures VSI specific rate limiter
      in HW.
      
      Configuring channels and bandwidth shaper parameters using "tc mqprio":
      ----------------------------------------------------------------
      tc qdisc add dev <ethX> root mqprio \
      	num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \
      	shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \
      	max_rate 4Gbit 5Gbit 6Gbit 7Gbit
      
      Command to view configured TCs:
      -----------------------------
      tc qdisc show dev <ethX>
      
      Deleting TCs:
      ------------
      tc qdisc del dev <ethX> root mqprio
      
      Signed-off-by: default avatarKiran Patil <kiran.patil@intel.com>
      Signed-off-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: default avatarSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Tested-by: default avatarBharathi Sreenivas <bharathi.sreenivas@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fbc7b27a
    • Kiran Patil's avatar
      ice: Add infrastructure for mqprio support via ndo_setup_tc · 0754d65b
      Kiran Patil authored
      
      
      Add infrastructure required for "ndo_setup_tc:qdisc_mqprio".
      ice_vsi_setup is modified to configure traffic classes based
      on mqprio data received from the stack. This includes low-level
      functions to configure min, max rate-limit parameters in hardware
      for traffic classes. Each traffic class gets mapped to a hardware
      channel (VSI) which can be individually configured with different
      bandwidth parameters.
      
      Co-developed-by: default avatarTarun Singh <tarun.k.singh@intel.com>
      Signed-off-by: default avatarTarun Singh <tarun.k.singh@intel.com>
      Signed-off-by: default avatarKiran Patil <kiran.patil@intel.com>
      Signed-off-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: default avatarSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Tested-by: default avatarBharathi Sreenivas <bharathi.sreenivas@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0754d65b
    • Toke Høiland-Jørgensen's avatar
      fq_codel: generalise ce_threshold marking for subset of traffic · dfcb63ce
      Toke Høiland-Jørgensen authored
      
      
      Commit e72aeb9e ("fq_codel: implement L4S style ce_threshold_ect1
      marking") expanded the ce_threshold feature of FQ-CoDel so it can
      be applied to a subset of the traffic, using the ECT(1) bit of the ECN
      field as the classifier. However, hard-coding ECT(1) as the only
      classifier for this feature seems limiting, so let's expand it to be more
      general.
      
      To this end, change the parameter from a ce_threshold_ect1 boolean, to a
      one-byte selector/mask pair (ce_threshold_{selector,mask}) which is applied
      to the whole diffserv/ECN field in the IP header. This makes it possible to
      classify packets by any value in either the ECN field or the diffserv
      field. In particular, setting a selector of INET_ECN_ECT_1 and a mask of
      INET_ECN_MASK corresponds to the functionality before this patch, and a
      mask of ~INET_ECN_MASK allows using the selector as a straight-forward
      match against a diffserv code point:
      
       # apply ce_threshold to ECT(1) traffic
       tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3
      
       # apply ce_threshold to ECN-capable traffic marked as diffserv AF22
       tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc
      
      Regardless of the selector chosen, the normal rules for ECN-marking of
      packets still apply, i.e., the flow must still declare itself ECN-capable
      by setting one of the bits in the ECN field to get marked at all.
      
      v2:
      - Add tc usage examples to patch description
      
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211019174709.69081-1-toke@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dfcb63ce
  4. Oct 20, 2021