Skip to content
  1. Jan 27, 2020
    • Stefano Brivio's avatar
      selftests: netfilter: Introduce tests for sets with range concatenation · 611973c1
      Stefano Brivio authored
      
      
      This test covers functionality and stability of the newly added
      nftables set implementation supporting concatenation of ranged
      fields.
      
      For some selected set expression types, test:
      - correctness, by checking that packets match or don't
      - concurrency, by attempting races between insertion, deletion, lookup
      - timeout feature, checking that packets don't match expired entries
      
      and (roughly) estimate matching rates, comparing to baselines for
      simple drop on netdev ingress hook and for hash and rbtrees sets.
      
      In order to send packets, this needs one of sendip, netcat or bash.
      To flood with traffic, iperf3, iperf and netperf are supported. For
      performance measurements, this relies on the sample pktgen script
      pktgen_bench_xmit_mode_netif_receive.sh.
      
      If none of the tools suitable for a given test are available, specific
      tests will be skipped.
      
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      611973c1
    • Stefano Brivio's avatar
      nf_tables: Add set type for arbitrary concatenation of ranges · 3c4287f6
      Stefano Brivio authored
      This new set type allows for intervals in concatenated fields,
      which are expressed in the usual way, that is, simple byte
      concatenation with padding to 32 bits for single fields, and
      given as ranges by specifying start and end elements containing,
      each, the full concatenation of start and end values for the
      single fields.
      
      Ranges are expanded to composing netmasks, for each field: these
      are inserted as rules in per-field lookup tables. Bits to be
      classified are divided in 4-bit groups, and for each group, the
      lookup table contains 4^2 buckets, representing all the possible
      values of a bit group. This approach was inspired by the Grouper
      algorithm:
      	http://www.cse.usf.edu/~ligatti/projects/grouper/
      
      Matching is performed by a sequence of AND operations between
      bucket values, with buckets selected according to the value of
      packet bits, for each group. The result of this sequence tells
      us which rules matched for a given field.
      
      In order to concatenate several ranged fields, per-field rules
      are mapped using mapping arrays, one per field, that specify
      which rules should be considered while matching the next field.
      The mapping array for the last field contains a reference to
      the element originally inserted.
      
      The notes in nft_set_pipapo.c cover the algorithm in deeper
      detail.
      
      A pure hash-based approach is of no use here, as ranges need
      to be classified. An implementation based on "proxying" the
      existing red-black tree set type, creating a tree for each
      field, was considered, but deemed impractical due to the fact
      that elements would need to be shared between trees, at least
      as long as we want to keep UAPI changes to a minimum.
      
      A stand-alone implementation of this algorithm is available at:
      	https://pipapo.lameexcu.se
      
      
      together with notes about possible future optimisations
      (in pipapo.c).
      
      This algorithm was designed with data locality in mind, and can
      be highly optimised for SIMD instruction sets, as the bulk of
      the matching work is done with repetitive, simple bitwise
      operations.
      
      At this point, without further optimisations, nft_concat_range.sh
      reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
      L2$):
      
      TEST: performance
        net,port                                                      [ OK ]
          baseline (drop from netdev hook):              10190076pps
          baseline hash (non-ranged entries):             6179564pps
          baseline rbtree (match on first field only):    2950341pps
          set with  1000 full, ranged entries:            2304165pps
        port,net                                                      [ OK ]
          baseline (drop from netdev hook):              10143615pps
          baseline hash (non-ranged entries):             6135776pps
          baseline rbtree (match on first field only):    4311934pps
          set with   100 full, ranged entries:            4131471pps
        net6,port                                                     [ OK ]
          baseline (drop from netdev hook):               9730404pps
          baseline hash (non-ranged entries):             4809557pps
          baseline rbtree (match on first field only):    1501699pps
          set with  1000 full, ranged entries:            1092557pps
        port,proto                                                    [ OK ]
          baseline (drop from netdev hook):              10812426pps
          baseline hash (non-ranged entries):             6929353pps
          baseline rbtree (match on first field only):    3027105pps
          set with 30000 full, ranged entries:             284147pps
        net6,port,mac                                                 [ OK ]
          baseline (drop from netdev hook):               9660114pps
          baseline hash (non-ranged entries):             3778877pps
          baseline rbtree (match on first field only):    3179379pps
          set with    10 full, ranged entries:            2082880pps
        net6,port,mac,proto                                           [ OK ]
          baseline (drop from netdev hook):               9718324pps
          baseline hash (non-ranged entries):             3799021pps
          baseline rbtree (match on first field only):    1506689pps
          set with  1000 full, ranged entries:             783810pps
        net,mac                                                       [ OK ]
          baseline (drop from netdev hook):              10190029pps
          baseline hash (non-ranged entries):             5172218pps
          baseline rbtree (match on first field only):    2946863pps
          set with  1000 full, ranged entries:            1279122pps
      
      v4:
       - fix build for 32-bit architectures: 64-bit division needs
         div_u64() (kbuild test robot <lkp@intel.com>)
      v3:
       - rework interface for field length specification,
         NFT_SET_SUBKEY disappears and information is stored in
         description
       - remove scratch area to store closing element of ranges,
         as elements now come with an actual attribute to specify
         the upper range limit (Pablo Neira Ayuso)
       - also remove pointer to 'start' element from mapping table,
         closing key is now accessible via extension data
       - use bytes right away instead of bits for field lengths,
         this way we can also double the inner loop of the lookup
         function to take care of upper and lower bits in a single
         iteration (minor performance improvement)
       - make it clearer that set operations are actually atomic
         API-wise, but we can't e.g. implement flush() as one-shot
         action
       - fix type for 'dup' in nft_pipapo_insert(), check for
         duplicates only in the next generation, and in general take
         care of differentiating generation mask cases depending on
         the operation (Pablo Neira Ayuso)
       - report C implementation matching rate in commit message, so
         that AVX2 implementation can be compared (Pablo Neira Ayuso)
      v2:
       - protect access to scratch maps in nft_pipapo_lookup() with
         local_bh_disable/enable() (Florian Westphal)
       - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
         already implied (Florian Westphal)
       - explain why partial allocation failures don't need handling
         in pipapo_realloc_scratch(), rename 'm' to clone and update
         related kerneldoc to make it clear we're not operating on
         the live copy (Florian Westphal)
       - add expicit check for priv->start_elem in
         nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
         with a NULL start element, and also zero it out in every
         operation that might make it invalid, so that insertion
         doesn't proceed with an invalid element (Florian Westphal)
      
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3c4287f6
    • Stefano Brivio's avatar
      bitmap: Introduce bitmap_cut(): cut bits and shift remaining · 20927671
      Stefano Brivio authored
      
      
      The new bitmap function bitmap_cut() copies bits from source to
      destination by removing the region specified by parameters first
      and cut, and remapping the bits above the cut region by right
      shifting them.
      
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      20927671
    • Stefano Brivio's avatar
      netfilter: nf_tables: Support for sets with multiple ranged fields · f3a2181e
      Stefano Brivio authored
      
      
      Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
      to specify the length of each field in a set concatenation.
      
      This allows set implementations to support concatenation of multiple
      ranged items, as they can divide the input key into matching data for
      every single field. Such set implementations would be selected as
      they specify support for NFT_SET_INTERVAL and allow desc->field_count
      to be greater than one. Explicitly disallow this for nft_set_rbtree.
      
      In order to specify the interval for a set entry, userspace would
      include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
      range endpoints as two separate keys, represented by attributes
      NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.
      
      While at it, export the number of 32-bit registers available for
      packet matching, as nftables will need this to know the maximum
      number of field lengths that can be specified.
      
      For example, "packets with an IPv4 address between 192.0.2.0 and
      192.0.2.42, with destination port between 22 and 25", can be
      expressed as two concatenated elements:
      
        NFTA_SET_ELEM_KEY:            192.0.2.0 . 22
        NFTA_SET_ELEM_KEY_END:        192.0.2.42 . 25
      
      and NFTA_SET_DESC_CONCAT attribute would contain:
      
        NFTA_LIST_ELEM
          NFTA_SET_FIELD_LEN:		4
        NFTA_LIST_ELEM
          NFTA_SET_FIELD_LEN:		2
      
      v4: No changes
      v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
      v2: No changes
      
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f3a2181e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute · 7b225d0b
      Pablo Neira Ayuso authored
      
      
      Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
      interval between kernel and userspace.
      
      This patch also adds the NFT_SET_EXT_KEY_END extension to store the
      closing element value in this interval.
      
      v4: No changes
      v3: New patch
      
      [sbrivio: refactor error paths and labels; add corresponding
        nft_set_ext_type for new key; rebase]
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7b225d0b
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add nft_setelem_parse_key() · 20a1452c
      Pablo Neira Ayuso authored
      
      
      Add helper function to parse the set element key netlink attribute.
      
      v4: No changes
      v3: New patch
      
      [sbrivio: refactor error paths and labels; use NFT_DATA_VALUE_MAXLEN
        instead of sizeof(*key) in helper, value can be longer than that;
        rebase]
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      20a1452c
  2. Jan 26, 2020
  3. Jan 25, 2020