Skip to content
  1. Dec 20, 2019
  2. Dec 19, 2019
    • Jose Abreu's avatar
      net: stmmac: tc: Fix TAPRIO division operation · a1ec57c0
      Jose Abreu authored
      For ARCHs that don't support 64 bits division we need to use the
      helpers.
      
      Fixes: b60189e0
      
       ("net: stmmac: Integrate EST with TAPRIO scheduler API")
      Signed-off-by: default avatarJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1ec57c0
    • David S. Miller's avatar
      Merge branch 'ETS-qdisc' · 6bff0017
      David S. Miller authored
      
      
      Petr Machata says:
      
      ====================
      Add a new Qdisc, ETS
      
      The IEEE standard 802.1Qaz (and 802.1Q-2014) specifies four principal
      transmission selection algorithms: strict priority, credit-based shaper,
      ETS (bandwidth sharing), and vendor-specific. All these have their
      corresponding knobs in DCB. But DCB does not have interfaces to configure
      RED and ECN, unlike Qdiscs.
      
      In the Qdisc land, strict priority is implemented by PRIO. Credit-based
      transmission selection algorithm can then be modeled by having e.g. TBF or
      CBS Qdisc below some of the PRIO bands. ETS would then be modeled by
      placing a DRR Qdisc under the last PRIO band.
      
      The problem with this approach is that DRR on its own, as well as the
      combination of PRIO and DRR, are tricky to configure and tricky to offload
      to 802.1Qaz-compliant hardware. This is due to several reasons:
      
      - As any classful Qdisc, DRR supports adding classifiers to decide in which
        class to enqueue packets. Unlike PRIO, there's however no fallback in the
        form of priomap. A way to achieve classification based on packet priority
        is e.g. like this:
      
          # tc filter add dev swp1 root handle 1: \
      		basic match 'meta(priority eq 0)' flowid 1:10
      
        Expressing the priomap in this manner however forces drivers to deep dive
        into the classifier block to parse the individual rules.
      
        A possible solution would be to extend the classes with a "defmap" a la
        split / defmap mechanism of CBQ, and introduce this as a last resort
        classification. However, unlike priomap, this doesn't have the guarantee
        of covering all priorities. Traffic whose priority is not covered is
        dropped by DRR as unclassified. But ASICs tend to implement dropping in
        the ACL block, not in scheduling pipelines. The need to treat these
        configurations correctly (if only to decide to not offload at all)
        complicates a driver.
      
        It's not clear how to retrofit priomap with all its benefits to DRR
        without changing it beyond recognition.
      
      - The interplay between PRIO and DRR is also causing problems. 802.1Qaz has
        all ETS TCs as a last resort. Switch ASICs that support ETS at all are
        likely to handle ETS traffic this way as well. However, the Linux model
        is more generic, allowing the DRR block in any band. Drivers would need
        to be careful to handle this case correctly, otherwise the offloaded
        model might not match the slow-path one.
      
        In a similar vein, PRIO and DRR need to agree on the list of priorities
        assigned to DRR. This is doubly problematic--the user needs to take care
        to keep the two in sync, and the driver needs to watch for any holes in
        DRR coverage and treat the traffic correctly, as discussed above.
      
        Note that at the time that DRR Qdisc is added, it has no classes, and
        thus any priorities assigned to that PRIO band are not covered. Thus this
        case is surprisingly rather common, and needs to be handled gracefully by
        the driver.
      
      - Similarly due to DRR flexibility, when a Qdisc (such as RED) is attached
        below it, it is not immediately clear which TC the class represents. This
        is unlike PRIO with its straightforward classid scheme. When DRR is
        combined with PRIO, the relationship between classes and TCs gets even
        more murky.
      
        This is a problem for users as well: the TC mapping is rather important
        for (devlink) shared buffer configuration and (ethtool) counters.
      
      So instead, this patch set introduces a new Qdisc, which is based on
      802.1Qaz wording. It is PRIO-like in how it is configured, meaning one
      needs to specify how many bands there are, how many are strict and how many
      are ETS, quanta for the latter, and priomap.
      
      The new Qdisc operates like the PRIO / DRR combo would when configured as
      per the standard. The strict classes, if any, are tried for traffic first.
      When there's no traffic in any of the strict queues, the ETS ones (if any)
      are treated in the same way as in DRR.
      
      The chosen interface makes the overall system both reasonably easy to
      configure, and reasonably easy to offload. The extra code to support ETS in
      mlxsw (which already supports PRIO) is about 150 lines, of which perhaps 20
      lines is bona fide new business logic.
      
      Credit-based shaping transmission selection algorithm can be configured by
      adding a CBS Qdisc under one of the strict bands (e.g. TBF can be used to a
      similar effect as well). As a non-work-conserving Qdisc, CBS can't be
      hooked under the ETS bands. This is detected and handled identically to DRR
      Qdisc at runtime. Note that offloading CBS is not subject of this patchset.
      
      The patchset proceeds in four stages:
      
      - Patches #1-#3 are cleanups.
      - Patches #4 and #5 contain the new Qdisc.
      - Patches #6 and #7 update mlxsw to offload the new Qdisc.
      - Patches #8-#10 add selftests for ETS.
      
      Examples:
      
      - Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
      
      - Tweak quantum of one of the classes of the previous Qdisc:
      
          # tc class ch dev swp1 classid 1:4 ets quantum 1000
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
          # tc class ch dev swp1 classid 1:3 ets quantum 1000
          Error: Strict bands do not have a configurable quantum.
      
      - Purely strict Qdisc with 1:1 mapping between priorities and TCs:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets strict 8 priomap 7 6 5 4 3 2 1 0
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
      
      - Use "bands" to specify number of bands explicitly. Underspecified bands
        are implicitly ETS and their quantum is taken from MTU. The following
        thus gives each band the same weight:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets bands 8 priomap 7 6 5 4 3 2 1 0
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
      
      v2:
      - This addresses points raised by David Miller.
      - Patch #4:
          - sch_ets.c: Add a comment with description of the Qdisc and the
            dequeuing algorithm.
          - Kconfig: Add a high-level description to the help blurb.
      
      v1:
      - No changes, first upstream submission after RFC.
      
      v3 (internal):
      - This addresses review from Jiri Pirko.
      - Patch #3:
          - Rename to _HR_ instead of to _HIERARCHY_.
      - Patch #4:
          - pkt_sched.h: Keep all the TCA_ETS_ constants in one enum.
          - pkt_sched.h: Rename TCA_ETS_BANDS to _NBANDS, _STRICT to _NSTRICT,
            _BAND_QUANTUM to _QUANTA_BAND and _PMAP_BAND to _PRIOMAP_BAND.
          - sch_ets.c: Update to reflect the above changes. Add a new policy,
            ets_class_policy, which is used when parsing class changes.
            Currently that policy is the same as the quanta policy, but that
            might change.
          - sch_ets.c: Move MTU handling from ets_quantum_parse() to the one
            caller that makes use of it.
          - sch_ets.c: ets_qdisc_priomap_parse(): WARN_ON_ONCE on invalid
            attribute instead of returning an extack.
      - Patch #6:
          - __mlxsw_sp_qdisc_ets_replace(): Pass the weights argument to this
            function in this patch already. Drop the weight computation.
          - mlxsw_sp_qdisc_prio_replace(): Rename "quanta" to "zeroes" and
            pass for the abovementioned "weights".
          - mlxsw_sp_qdisc_prio_graft(): Convert to a wrapper around
            __mlxsw_sp_qdisc_ets_graft(), instead of invoking the latter
            directly from mlxsw_sp_setup_tc_prio().
          - Update to follow the _HIERARCHY_ -> _HR_ renaming.
      - Patch #7:
          - __mlxsw_sp_qdisc_ets_replace(): The "weights" argument passing and
            weight computation removal are now done in a previous patch.
          - mlxsw_sp_setup_tc_ets(): Drop case TC_ETS_REPLACE, which is handled
            earlier in the function.
      - Patch #3 (iproute2):
          - Add an example output to the commit message.
          - tc-ets.8: Fix output of two examples.
          - tc-ets.8: Describe default values of "bands", "quanta".
          - q_ets.c: A number of fixes in error messages.
          - q_ets.c: Comment formatting: /*padding*/ -> /* padding */
          - q_ets.c: parse_nbands: Move duplicate checking to callers.
          - q_ets.c: Don't accept both "quantum" and "quanta" as equivalent.
      
      v2 (internal):
      - This addresses review from Ido Schimmel and comments from Alexander
        Kushnarov.
      - Patch #2:
          - s/coment/comment in the commit message.
      - Patch #4:
          - sch_ets: ets_class_is_strict(), ets_class_id(): Constify an argument
          - ets_class_find(): RXTify
      - Patch #3 (iproute2):
          - tc-ets.8: some spelling fixes
          - tc-ets.8: add another example
          - tc.8: add an ETS to "CLASSFUL QDISCS" section
      
      v1 (internal):
      - This addresses RFC reviews from Ido Schimmel and Roman Mashak, bugs found
        by Alexander Petrovskiy and myself, and other improvements.
      - Patch #2:
          - Expand the explanation with an explicit example.
      - Patch #4:
          - Kconfig: s/sch_drr/sch_ets/
          - sch_ets: Reorder includes to be in alphabetical order
          - sch_ets: ets_quantum_parse(): Rename the return-pointer argument
            from pquantum to quantum, and use it directly, not going through a
            local temporary.
          - sch_ets: ets_qdisc_quanta_parse(): Convert syntax of function
            argument "quanta" from an array to a pointer.
          - sch_ets: ets_qdisc_priomap_parse(): Likewise with "priomap".
          - sch_ets: ets_qdisc_quanta_parse(), ets_qdisc_priomap_parse(): Invoke
            __nla_validate_nested directly instead of nl80211_validate_nested().
          - sch_ets: ets_qdisc_quanta_parse(): WARN_ON_ONCE on invalid attribute
            instead of returning an extack.
          - sch_ets: ets_qdisc_change(): Make the last band the default one for
            unmentioned priomap priorities.
          - sch_ets: Fix a panic when an offloaded child in a bandwidth-sharing
            band notified its ETS parent.
          - sch_ets: When ungrafting, add the newly-created invisible FIFO to
            the Qdisc hash
      - Patch #5:
          - pkt_cls.h: Note that quantum=0 signifies a strict band.
          - Fix error path handling when ets_offload_dump() fails.
      - Patch #6:
          - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function arguments
            "quanta" and "priomap" from arrays to pointers.
      - Patch #7:
          - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function argument
            "weights" from an array to a pointer.
      - Patch #9:
          - mlxsw/sch_ets.sh: Add a comment explaining packet prioritization.
          - Adjust the whole suite to allow testing of traffic classifiers
            in addition to testing priomap.
      - Patch #10:
          - Add a number of new tests to test default priomap band, overlarge
            number of bands, zeroes in quanta, and altogether missing quanta.
      - Patch #1 (iproute2):
          - State motivation for inclusion of this patch in the patcheset in the
            commit message.
      - Patch #3 (iproute2):
          - tc-ets.8: it is now December
          - tc-ets.8: explain inactivity WRT using non-WC Qdiscs under ETS band
          - tc-ets.8: s/flow/band in explanation of quantum
          - tc-ets.8: explain what happens with priorities not covered by priomap
          - tc-ets.8: default priomap band is now the last one
          - q_ets.c: ets_parse_opt(): Remove unnecessary initialization of
            priomap and quanta.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bff0017
    • Petr Machata's avatar
      selftests: qdiscs: Add test coverage for ETS Qdisc · 82c664b6
      Petr Machata authored
      
      
      Add TDC coverage for the new ETS Qdisc.
      
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82c664b6
    • Petr Machata's avatar
      selftests: forwarding: sch_ets: Add test coverage for ETS Qdisc · ddd3fd75
      Petr Machata authored
      
      
      This tests the newly-added ETS Qdisc. It runs two to three streams of
      traffic, each with a different priority. ETS Qdisc is supposed to allocate
      bandwidth according to the DRR algorithm and given weights. After running
      the traffic for a while, counters are compared for each stream to check
      that the expected ratio is in fact observed.
      
      In order for the DRR process to kick in, a traffic bottleneck must exist in
      the first place. In slow path, such bottleneck can be implemented by
      wrapping the ETS Qdisc inside a TBF or other shaper. This might however
      make the configuration unoffloadable. Instead, on HW datapath, the
      bottleneck would be set up by lowering port speed and configuring shared
      buffer suitably.
      
      Therefore the test is structured as a core component that implements the
      testing, with two wrapper scripts that implement the details of slow path
      resp. fast path configuration.
      
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddd3fd75
    • Petr Machata's avatar
      selftests: forwarding: Move start_/stop_traffic from mlxsw to lib.sh · 4cf9b8f9
      Petr Machata authored
      
      
      These two functions are used for starting several streams of traffic, and
      then stopping them later. They will be handy for the test coverage of ETS
      Qdisc. Move them from mlxsw-specific qos_lib.sh to the generic lib.sh.
      
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cf9b8f9
    • Petr Machata's avatar
      mlxsw: spectrum_qdisc: Support offloading of ETS Qdisc · 19f405b9
      Petr Machata authored
      
      
      Handle TC_SETUP_QDISC_ETS, add a new ops structure for the ETS Qdisc.
      Invoke the extended prio handlers implemented in the previous patch. For
      stats ops, invoke directly the prio callbacks, which are not sensitive to
      differences between PRIO and ETS.
      
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19f405b9
    • Petr Machata's avatar
      mlxsw: spectrum_qdisc: Generalize PRIO offload to support ETS · 7917f52a
      Petr Machata authored
      
      
      Thanks to the similarity between PRIO and ETS it is possible to simply
      reuse most of the code for offloading PRIO Qdisc. Extract the common
      functionality into separate functions, making the current PRIO handlers
      thin API adapters.
      
      Extend the new functions to pass quanta for individual bands, which allows
      configuring a subset of bands as WRR. Invoke mlxsw_sp_port_ets_set() as
      appropriate to de/configure WRR-ness and weight of individual bands.
      
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7917f52a