Skip to content
  1. Aug 19, 2015
  2. Aug 18, 2015
    • David S. Miller's avatar
      Merge branch 'Identifier-Locator-Addressing' · 0b233dc7
      David S. Miller authored
      
      
      Tom Herbert says:
      
      ====================
      net: Identifier Locator Addressing - Part I
      
      This patch set provides rudimentary support for Identifier Locator
      Addressing or ILA. The basic concept of ILA is that we split an IPv6
      address into a 64 bit locator and 64 bit identifier. The identifier is
      the identity of an entity in communication ("who"), and the locator
      expresses the location of the entity ("where"). Applications
      use externally visible address that contains the identifier.
      When a packet is actually sent, a translation is done that
      overwrites the first 64 bits of the address with a locator.
      The packet can then be forwarded over the network to the host where
      the addressed entity is located. At the receiver, the reverse
      translation is done so the that the application sees the original,
      untranslated address. Presumably an external control plane will
      provide identifier->locator mappings.
      
      v2:
        - Fix compilation erros when LWT not configured
        - Consolidate ILA into a single ila.c
      
      v3:
        - Change pseudohdr argument od inet_proto_csum_replace functions to
          be a bool
      
      v4:
        - In ila_build_state check locator being in netlink params before
          allocating tunnel state
      
      The data path for ILA is a simple NAT translation that only operates
      on the upper 64 bits of a destination address in IPv6 packets. The
      basic process is:
      
         1) Lookup 64 bit identifier (lower 64 bits of destination)
         2) If a match is found
            a) Overwrite locator (upper 64 bits of destination) with
               the new locator
            b) Adjust any checksum that has destination address included in
               pseudo header
         3) Send or receive packet
      
      ILA is a means to implement tunnels or network virtualization without
      encapsulation. Since there is no encapsulation involved, we assume that
      stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.)
      just works. Also, since we're minimally changing the packet many of
      the worries about encapsulation (MTU, checksum, fragmentation) are
      not relevant. The downside is that, ILA is not extensible like other
      encapsulations (GUE for instance) so it might not be appropriate for
      all use cases. Also, this only makes sense to do in IPv6!
      
      A key aspect of ILA is performance. The intent is that ILA would be
      used in data centers in virtualizing tasks or jobs. In the fullest
      incarnation all intra data center communications might be targeted to
      virtual ILA addresses. This is basically adding a new virtualization
      capability to the existing services in a datacenter, so there is a
      strong expectation is that this does not degrade performance for
      existing applications.
      
      Performance seems to be dependent on how ILA is hooked into kernel.
      ILA can be implemented under some different models:
      
        - Mechanically it is a form a stateless DNAT
        - It can be thought of as a type of (source) routing
        - As a functional replacement of encapsulation
      
      In this patch set we hook into the data path using Light Weight
      Tunnels (LWT) infrastructure. As part of that, we add support in LWT
      to redirect dst input. iproute will be modified to take a new ila encap
      type. ILA can be configured like:
      
      ip route add 3333:0:0:1:5555:0:2:0/128 \
         encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0
      
      ip -6 addr add 3333:0:0:1:5555:0:1:0/128 dev eth0
      
      ip route add table local local 2001:0:0:1:5555:0:1:0/128
         encap ila 3333:0:0:1 dev lo
      
      So sending to destination 3333:0:0:1:5555:0:2:0 will have destination
      of 2001:0:0:2:5555:0:2:0 on the wire.
      
      Performance results are below. With ILA we see about a 10% drop in
      pps compared to non-ILA. Much of this drop can be attributed to the
      loss of early demux on input (translation occurs after it is attempted).
      We will address this in the next patch set. Also, IPvlan input path
      does not work with ILA since the routing is bypassed-- this will
      be addressed in a future patch.
      
      Performance testing:
      
      Performing netperf TCP_RR with 200 clients:
      
      Non-ILA baseline
        84.92% CPU utilization
        1861922.9 tps
        93/163/330 50/90/99% latencies
      
      ILA single destination
        83.16% CPU utilization
        1679683.4 tps
        105/180/332 50/90/99% latencies
      
      References:
      
      Slides from netconf:
      http://vger.kernel.org/netconf2015Herbert-ILA.pdf
      
      Slides from presentation at IETF:
      https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf
      
      I-D:
      https://tools.ietf.org/html/draft-herbert-nvo3-ila-00
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b233dc7
    • Tom Herbert's avatar
      net: Identifier Locator Addressing module · 65d7ab8d
      Tom Herbert authored
      
      
      Adding new module name ila. This implements ILA translation. Light
      weight tunnel redirection is used to perform the translation in
      the data path. This is configured by the "ip -6 route" command
      using the "encap ila <locator>" option, where <locator> is the
      value to set in destination locator of the packet. e.g.
      
      ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
            encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0
      
      Sets a route where 3333:0:0:1 will be overwritten by
      2001:0:0:1 on output.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d7ab8d
    • Tom Herbert's avatar
      net: Add inet_proto_csum_replace_by_diff utility function · abc5d1ff
      Tom Herbert authored
      
      
      This function updates a checksum field value and skb->csum based on
      a value which is the difference between the old and new checksum.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abc5d1ff
    • Tom Herbert's avatar
      net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool · 4b048d6d
      Tom Herbert authored
      
      
      inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
      the checksum field carries a pseudo header. This argument should be a
      boolean instead of an int.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b048d6d
    • Tom Herbert's avatar
      lwt: Add support to redirect dst.input · 25368623
      Tom Herbert authored
      
      
      This patch adds the capability to redirect dst input in the same way
      that dst output is redirected by LWT.
      
      Also, save the original dst.input and and dst.out when setting up
      lwtunnel redirection. These can be called by the client as a pass-
      through.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25368623
    • David S. Miller's avatar
      enic: Fix sparse warning in vnic_devcmd_init(). · f376d4ad
      David S. Miller authored
      
      
      >> drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: sparse: incorrect type in assignment (different address spaces)
         drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    expected void *res
         drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    got void [noderef] <asn:2>*
      
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f376d4ad
    • David S. Miller's avatar
      mlx5e: Fix sparse warnings in mlx5e_handle_csum(). · ecf842f6
      David S. Miller authored
      
      
      >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44: sparse: incorrect type in argument 1 (different base types)
         drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    expected restricted __sum16 [usertype] n
         drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    got restricted __be16 [usertype] check_sum
      
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecf842f6
    • David Ahern's avatar
      inet: Move VRF table lookup to inlined function · dc028da5
      David Ahern authored
      
      
      Table lookup compiles out when VRF is not enabled.
      
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc028da5
    • David Ahern's avatar
      net: Fix docbook warning for IFF_VRF_MASTER enum · 808d28c4
      David Ahern authored
      kbuild test robot reported:
      tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
      head:   d52736e2
      commit: 4e3c8992
      
       [751/762] net: Introduce VRF related flags and helpers
      reproduce: make htmldocs
      
      >> Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags'
      
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      808d28c4
    • David Ahern's avatar
      net: Updates to netif_index_is_vrf · 2f52bdcf
      David Ahern authored
      
      
      As Eric noted netif_index_is_vrf is not called with rcu_read_lock held,
      so wrap the dev_get_by_index_rcu in rcu_read_lock and unlock.
      
      If VRF is not enabled or oif is 0 skip the device lookup. In both cases
      index cannot be the VRF master.
      
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f52bdcf
    • David S. Miller's avatar
      Merge branch 'mlx5e-next' · 9cd3778c
      David S. Miller authored
      Achiad Shochat says:
      
      ====================
      Driver updates 16-Aug-2015
      
      This patchset contains bug fixes, new RSS and pause parameters ethtool
      options, and support for RX CHECKSUM_COMPLETE.
      
      Patchset was applied and tested over commit adc6310c
      
       ("Merge branch
      'mv88e6xxx-switchdev-fdb'").
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cd3778c
    • Achiad Shochat's avatar
      net/mlx5e: Support RX CHECKSUM_COMPLETE · bbceefce
      Achiad Shochat authored
      
      
      Only for packets with first ethertype set to IPv4/6 for now.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbceefce
    • Achiad Shochat's avatar
      net/mlx5e: Support ethtool get/set_pauseparam · 3c2d18ef
      Achiad Shochat authored
      
      
      Only rx/tx pause settings.
      Autoneg setting is currently not supported.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c2d18ef
    • Achiad Shochat's avatar
      net/mlx5e: Ethtool link speed setting fixes · 6fa1bcab
      Achiad Shochat authored
      
      
      - Port speed settings are applied by the device only upon
        port admin status transition from DOWN to UP.
        So we enforce this transition regardless of the port's
        current operation state (which may be occasionally DOWN if
        for example the network cable is disconnected).
      - Fix the PORT_UP/DOWN device interface enum
      - Set the local_port bit in the device PAOS register
      - EXPORT the PAOS (Port Administrative and Operational Status)
        register set/query access functions.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fa1bcab
    • Achiad Shochat's avatar
      net/mlx5e: HW LRO changes/fixes · d9a40271
      Achiad Shochat authored
      
      
      - Change the maximum LRO session size from 16KB to 64KB
      - Reduce the LRO session timeout from 512us to 32us in
        order to reduce the TCP latency of non-LRO'ed flows.
      - Fix skb_shinfo(skb)->gso_size and set skb_shinfo(skb)->gso_type.
      - Fix a bug accessing un-initialized mdev pointer.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9a40271
    • Achiad Shochat's avatar
      net/mlx5e: Support smaller RX/TX ring sizes · e842b100
      Achiad Shochat authored
      
      
      We un-intentionally limited the minimum rings size too much.
      
      TX minimum ring size reduced from 128 to 64.
      RX minimum ring size reduced from 128 to 2.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e842b100
    • Achiad Shochat's avatar
      net/mlx5e: Add ethtool RSS configuration options · 2d75b2bc
      Achiad Shochat authored
      
      
      - get_rxfh_key_size
      - get_rxfh_indir_size
      - get/set_rxfh indirection table and RSS Toeplitz hash key
      - get_rxnfc
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d75b2bc
    • Achiad Shochat's avatar
      net/mlx5e: Make RSS indirection table size a constant · 936896e9
      Achiad Shochat authored
      
      
      The indirection table size was defined by a variable that
      was actually assigned a constant value.
      Since we do not have any forseen intension to make it configurable
      we simply made it a constant.
      
      We also limit the number of channels such that the RSS indirection
      table could always populate all RX rings.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      936896e9
    • Achiad Shochat's avatar
      net/mlx5e: Have a single RSS Toeplitz hash key · 57afead5
      Achiad Shochat authored
      
      
      No need to generate a unique key per TIR.
      Generating a single key per netdev and copying it to all
      its TIRs.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57afead5
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · 0aa65cc0
      David S. Miller authored
      
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2015-08-16
      
      Here's what's likely the last bluetooth-next pull request for 4.3:
      
       - 6lowpan/802.15.4 refactoring, cleanups & fixes
       - Document 6lowpan netdev usage in Documentation/networking/6lowpan.txt
       - Support for UART based QCA Bluetooth controllers
       - Power management support for Broeadcom Bluetooth controllers
       - Change LE connection initiation to always use passive scanning first
       - Support for new Silicon Wave USB ID
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0aa65cc0
    • David S. Miller's avatar
      Merge branch 'enic-devcmd2' · 863960b4
      David S. Miller authored
      
      
      Govindarajulu Varadarajan says:
      
      ====================
      enic: add devcmd2
      
      This series adds new devcmd2 support. The first two patches are code
      refactoring.
      
      devcmd is an interface for driver to communicate with fw/adaptor. It
      involves writing data to hardware registers and waiting for the result.
      This mechanism does not scale well. The queuing of "no wait" devcmds is
      done in firmware memory rather than on the host. Firmware memory is a
      rather more scarce and valuable resource than host memory. A devcmd storm
      from one vf can disrupt the service on other pf/vf. The lack of flow
      control allows for possible denial of server from one VM to another.
      Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
      allows better flow control.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      863960b4
    • Govindarajulu Varadarajan's avatar
      enic: add devcmd2 · 373fb087
      Govindarajulu Varadarajan authored
      
      
      devcmd is an interface for driver to communicate with fw/adaptor. It
      involves writing data to hardware registers and waiting for the result.
      This mechanism does not scale well. The queuing of "no wait" devcmds is
      done in firmware memory rather than on the host. Firmware memory is a
      rather more scarce and valuable resource than host memory. A devcmd storm
      from one vf can disrupt the service on other pf/vf. The lack of flow
      control allows for possible denial of server from one VM to another.
      
      Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
      allows better flow control.
      
      Initialize devcmd2, if fails we fall back to devcmd1.
      
      Also change the driver version.
      
      Signed-off-by: default avatarN V V Satyanarayana Reddy <nalreddy@cisco.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      373fb087
    • Govindarajulu Varadarajan's avatar
      enic: add devcmd2 resources · fda3f52b
      Govindarajulu Varadarajan authored
      
      
      Add devcmd resources to vnic_res_type. Add data types used by devcmd.
      
      Signed-off-by: default avatarN V V Satyanarayana Reddy <nalreddy@cisco.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fda3f52b
    • Govindarajulu Varadarajan's avatar
      enic: use netdev_<foo> or dev_<foo> instead of pr_<foo> · 6a3c2f83
      Govindarajulu Varadarajan authored
      
      
      pr_info does not give any details about the interface involved. This patch
      uses netdev_info for printing the message. Use dev_info where netdev is not
      ready.
      
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a3c2f83
    • Govindarajulu Varadarajan's avatar
      enic: move struct definition from .c to .h file · 8b89f3a1
      Govindarajulu Varadarajan authored
      
      
      Some of the structure definitions are in .c file to make them private to
      that file. This patch moves the struct definition to .h file, So that their
      definitions are accessible from other files.
      
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b89f3a1
    • David S. Miller's avatar
      2ea273d7
    • Ian Morris's avatar
      ipv6: trivial whitespace fix · ec120da6
      Ian Morris authored
      
      
      Change brace placement to be in line with coding standards
      
      Signed-off-by: default avatarIan Morris <ipm@chirality.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec120da6
    • Phil Sutter's avatar
      rhashtable-test: extend to test concurrency · f4a3e90b
      Phil Sutter authored
      
      
      After having tested insertion, lookup, table walk and removal, spawn a
      number of threads running operations on the same rhashtable. Each of
      them will:
      
      1) insert it's own set of objects,
      2) lookup every successfully inserted object and finally
      3) remove objects in several rounds until all of them have been removed,
         making sure the remaining ones are still found after each round.
      
      This should put a good amount of load onto the system and due to
      synchronising thread startup via two semaphores also extensive
      concurrent table access.
      
      The default number of ten threads returned within half a second on my
      local VM with two cores. Running 200 threads took about four seconds. If
      slow systems suffer too much from this though, the default could be
      lowered or even set to zero so this extended test does not run at all by
      default.
      
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4a3e90b
    • David S. Miller's avatar
      Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · c1f066d4
      David S. Miller authored
      
      
      Antonio Quartulli says:
      
      ====================
      Included changes:
      - avoid integer overflow in GW selection routine
      - prevent race condition by making capability bit changes atomic (use
        clear/set/test_bit)
      - fix synchronization issue in mcast tvlv handler
      - fix crash on double list removal of TT Request objects
      - fix leak by puring packets enqueued for sending upon iface removal
      - ensure network header pointer is set in skb
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1f066d4
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-davem-2015-08-14' of... · 2bd736fa
      David S. Miller authored
      
      Merge tag 'mac80211-next-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      Another pull request for the next cycle, this time with quite
      a bit of content:
       * mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
       * TDLS higher bandwidth support (Arik)
       * OCB fixes from Bertold Van den Bergh
       * suspend/resume fixes from Eliad
       * dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
       * VHT bitrate mask support (Lorenzo Bianconi)
       * better regulatory support for 5/10 MHz channels (Matthias May)
       * basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
      along with a number of other cleanups.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bd736fa
    • David S. Miller's avatar
      Merge branch 'bpf_fanout' · 90eb7fa5
      David S. Miller authored
      
      
      Willem de Bruijn says:
      
      ====================
      packet: add cBPF and eBPF fanout modes
      
      Allow programmable fanout modes. Support both classical BPF programs
      passed directly and extended BPF programs passed by file descriptor.
      
      One use case is packet steering by deep packet inspection, for
      instance for packet steering by application layer header fields.
      
      Separate the configuration of the fanout mode and the configuration
      of the program, to allow dynamic updates to the latter at runtime.
      
      Changes
        v1 -> v2:
          - follow SO_LOCK_FILTER semantics on filter updates
          - only accept eBPF programs of type BPF_PROG_TYPE_SOCKET_FILTER
          - rename PACKET_FANOUT_BPF to PACKET_FANOUT_CBPF to match
            man 2 bpf usage: "classic" vs. "extended" BPF.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90eb7fa5
    • Willem de Bruijn's avatar
      selftests/net: test extended BPF fanout mode · 30da679e
      Willem de Bruijn authored
      
      
      Test PACKET_FANOUT_EBPF by inserting a program into the the kernel
      with bpf(), then attaching it to the fanout group. Observe the same
      payload-based distribution as in the PACKET_FANOUT_CBPF test.
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30da679e