Skip to content
  1. Aug 19, 2015
  2. Aug 18, 2015
    • David S. Miller's avatar
      Merge branch 'Identifier-Locator-Addressing' · 0b233dc7
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      net: Identifier Locator Addressing - Part I
      
      This patch set provides rudimentary support for Identifier Locator
      Addressing or ILA. The basic concept of ILA is that we split an IPv6
      address into a 64 bit locator and 64 bit identifier. The identifier is
      the identity of an entity in communication ("who"), and the locator
      expresses the location of the entity ("where"). Applications
      use externally visible address that contains the identifier.
      When a packet is actually sent, a translation is done that
      overwrites the first 64 bits of the address with a locator.
      The packet can then be forwarded over the network to the host where
      the addressed entity is located. At the receiver, the reverse
      translation is done so the that the application sees the original,
      untranslated address. Presumably an external control plane will
      provide identifier->locator mappings.
      
      v2:
        - Fix compilation erros when LWT not configured
        - Consolidate ILA into a single ila.c
      
      v3:
        - Change pseudohdr argument od inet_proto_csum_replace functions to
          be a bool
      
      v4:
        - In ila_build_state check locator being in netlink params before
          allocating tunnel state
      
      The data path for ILA is a simple NAT translation that only operates
      on the upper 64 bits of a destination address in IPv6 packets. The
      basic process is:
      
         1) Lookup 64 bit identifier (lower 64 bits of destination)
         2) If a match is found
            a) Overwrite locator (upper 64 bits of destination) with
               the new locator
            b) Adjust any checksum that has destination address included in
               pseudo header
         3) Send or receive packet
      
      ILA is a means to implement tunnels or network virtualization without
      encapsulation. Since there is no encapsulation involved, we assume that
      stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.)
      just works. Also, since we're minimally changing the packet many of
      the worries about encapsulation (MTU, checksum, fragmentation) are
      not relevant. The downside is that, ILA is not extensible like other
      encapsulations (GUE for instance) so it might not be appropriate for
      all use cases. Also, this only makes sense to do in IPv6!
      
      A key aspect of ILA is performance. The intent is that ILA would be
      used in data centers in virtualizing tasks or jobs. In the fullest
      incarnation all intra data center communications might be targeted to
      virtual ILA addresses. This is basically adding a new virtualization
      capability to the existing services in a datacenter, so there is a
      strong expectation is that this does not degrade performance for
      existing applications.
      
      Performance seems to be dependent on how ILA is hooked into kernel.
      ILA can be implemented under some different models:
      
        - Mechanically it is a form a stateless DNAT
        - It can be thought of as a type of (source) routing
        - As a functional replacement of encapsulation
      
      In this patch set we hook into the data path using Light Weight
      Tunnels (LWT) infrastructure. As part of that, we add support in LWT
      to redirect dst input. iproute will be modified to take a new ila encap
      type. ILA can be configured like:
      
      ip route add 3333:0:0:1:5555:0:2:0/128 \
         encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0
      
      ip -6 addr add 3333:0:0:1:5555:0:1:0/128 dev eth0
      
      ip route add table local local 2001:0:0:1:5555:0:1:0/128
         encap ila 3333:0:0:1 dev lo
      
      So sending to destination 3333:0:0:1:5555:0:2:0 will have destination
      of 2001:0:0:2:5555:0:2:0 on the wire.
      
      Performance results are below. With ILA we see about a 10% drop in
      pps compared to non-ILA. Much of this drop can be attributed to the
      loss of early demux on input (translation occurs after it is attempted).
      We will address this in the next patch set. Also, IPvlan input path
      does not work with ILA since the routing is bypassed-- this will
      be addressed in a future patch.
      
      Performance testing:
      
      Performing netperf TCP_RR with 200 clients:
      
      Non-ILA baseline
        84.92% CPU utilization
        1861922.9 tps
        93/163/330 50/90/99% latencies
      
      ILA single destination
        83.16% CPU utilization
        1679683.4 tps
        105/180/332 50/90/99% latencies
      
      References:
      
      Slides from netconf:
      http://vger.kernel.org/netconf2015Herbert-ILA.pdf
      
      Slides from presentation at IETF:
      https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf
      
      I-D:
      https://tools.ietf.org/html/draft-herbert-nvo3-ila-00
      
      
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b233dc7
    • Tom Herbert's avatar
      net: Identifier Locator Addressing module · 65d7ab8d
      Tom Herbert authored
      
      
      Adding new module name ila. This implements ILA translation. Light
      weight tunnel redirection is used to perform the translation in
      the data path. This is configured by the "ip -6 route" command
      using the "encap ila <locator>" option, where <locator> is the
      value to set in destination locator of the packet. e.g.
      
      ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
            encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0
      
      Sets a route where 3333:0:0:1 will be overwritten by
      2001:0:0:1 on output.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d7ab8d
    • Tom Herbert's avatar
      net: Add inet_proto_csum_replace_by_diff utility function · abc5d1ff
      Tom Herbert authored
      
      
      This function updates a checksum field value and skb->csum based on
      a value which is the difference between the old and new checksum.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abc5d1ff
    • Tom Herbert's avatar
      net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool · 4b048d6d
      Tom Herbert authored
      
      
      inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
      the checksum field carries a pseudo header. This argument should be a
      boolean instead of an int.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b048d6d
    • Tom Herbert's avatar
      lwt: Add support to redirect dst.input · 25368623
      Tom Herbert authored
      
      
      This patch adds the capability to redirect dst input in the same way
      that dst output is redirected by LWT.
      
      Also, save the original dst.input and and dst.out when setting up
      lwtunnel redirection. These can be called by the client as a pass-
      through.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25368623
    • David S. Miller's avatar
      enic: Fix sparse warning in vnic_devcmd_init(). · f376d4ad
      David S. Miller authored
      
      
      >> drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: sparse: incorrect type in assignment (different address spaces)
         drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    expected void *res
         drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    got void [noderef] <asn:2>*
      
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f376d4ad
    • David S. Miller's avatar
      mlx5e: Fix sparse warnings in mlx5e_handle_csum(). · ecf842f6
      David S. Miller authored
      
      
      >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44: sparse: incorrect type in argument 1 (different base types)
         drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    expected restricted __sum16 [usertype] n
         drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    got restricted __be16 [usertype] check_sum
      
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecf842f6
    • David Ahern's avatar
      inet: Move VRF table lookup to inlined function · dc028da5
      David Ahern authored
      
      
      Table lookup compiles out when VRF is not enabled.
      
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc028da5
    • David Ahern's avatar
      net: Fix docbook warning for IFF_VRF_MASTER enum · 808d28c4
      David Ahern authored
      kbuild test robot reported:
      tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
      head:   d52736e2
      commit: 4e3c8992
      
       [751/762] net: Introduce VRF related flags and helpers
      reproduce: make htmldocs
      
      >> Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags'
      
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      808d28c4