Skip to content
  1. Aug 18, 2015
    • David S. Miller's avatar
      Merge branch 'Identifier-Locator-Addressing' · 0b233dc7
      David S. Miller authored
      
      
      Tom Herbert says:
      
      ====================
      net: Identifier Locator Addressing - Part I
      
      This patch set provides rudimentary support for Identifier Locator
      Addressing or ILA. The basic concept of ILA is that we split an IPv6
      address into a 64 bit locator and 64 bit identifier. The identifier is
      the identity of an entity in communication ("who"), and the locator
      expresses the location of the entity ("where"). Applications
      use externally visible address that contains the identifier.
      When a packet is actually sent, a translation is done that
      overwrites the first 64 bits of the address with a locator.
      The packet can then be forwarded over the network to the host where
      the addressed entity is located. At the receiver, the reverse
      translation is done so the that the application sees the original,
      untranslated address. Presumably an external control plane will
      provide identifier->locator mappings.
      
      v2:
        - Fix compilation erros when LWT not configured
        - Consolidate ILA into a single ila.c
      
      v3:
        - Change pseudohdr argument od inet_proto_csum_replace functions to
          be a bool
      
      v4:
        - In ila_build_state check locator being in netlink params before
          allocating tunnel state
      
      The data path for ILA is a simple NAT translation that only operates
      on the upper 64 bits of a destination address in IPv6 packets. The
      basic process is:
      
         1) Lookup 64 bit identifier (lower 64 bits of destination)
         2) If a match is found
            a) Overwrite locator (upper 64 bits of destination) with
               the new locator
            b) Adjust any checksum that has destination address included in
               pseudo header
         3) Send or receive packet
      
      ILA is a means to implement tunnels or network virtualization without
      encapsulation. Since there is no encapsulation involved, we assume that
      stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.)
      just works. Also, since we're minimally changing the packet many of
      the worries about encapsulation (MTU, checksum, fragmentation) are
      not relevant. The downside is that, ILA is not extensible like other
      encapsulations (GUE for instance) so it might not be appropriate for
      all use cases. Also, this only makes sense to do in IPv6!
      
      A key aspect of ILA is performance. The intent is that ILA would be
      used in data centers in virtualizing tasks or jobs. In the fullest
      incarnation all intra data center communications might be targeted to
      virtual ILA addresses. This is basically adding a new virtualization
      capability to the existing services in a datacenter, so there is a
      strong expectation is that this does not degrade performance for
      existing applications.
      
      Performance seems to be dependent on how ILA is hooked into kernel.
      ILA can be implemented under some different models:
      
        - Mechanically it is a form a stateless DNAT
        - It can be thought of as a type of (source) routing
        - As a functional replacement of encapsulation
      
      In this patch set we hook into the data path using Light Weight
      Tunnels (LWT) infrastructure. As part of that, we add support in LWT
      to redirect dst input. iproute will be modified to take a new ila encap
      type. ILA can be configured like:
      
      ip route add 3333:0:0:1:5555:0:2:0/128 \
         encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0
      
      ip -6 addr add 3333:0:0:1:5555:0:1:0/128 dev eth0
      
      ip route add table local local 2001:0:0:1:5555:0:1:0/128
         encap ila 3333:0:0:1 dev lo
      
      So sending to destination 3333:0:0:1:5555:0:2:0 will have destination
      of 2001:0:0:2:5555:0:2:0 on the wire.
      
      Performance results are below. With ILA we see about a 10% drop in
      pps compared to non-ILA. Much of this drop can be attributed to the
      loss of early demux on input (translation occurs after it is attempted).
      We will address this in the next patch set. Also, IPvlan input path
      does not work with ILA since the routing is bypassed-- this will
      be addressed in a future patch.
      
      Performance testing:
      
      Performing netperf TCP_RR with 200 clients:
      
      Non-ILA baseline
        84.92% CPU utilization
        1861922.9 tps
        93/163/330 50/90/99% latencies
      
      ILA single destination
        83.16% CPU utilization
        1679683.4 tps
        105/180/332 50/90/99% latencies
      
      References:
      
      Slides from netconf:
      http://vger.kernel.org/netconf2015Herbert-ILA.pdf
      
      Slides from presentation at IETF:
      https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf
      
      I-D:
      https://tools.ietf.org/html/draft-herbert-nvo3-ila-00
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b233dc7
    • Tom Herbert's avatar
      net: Identifier Locator Addressing module · 65d7ab8d
      Tom Herbert authored
      
      
      Adding new module name ila. This implements ILA translation. Light
      weight tunnel redirection is used to perform the translation in
      the data path. This is configured by the "ip -6 route" command
      using the "encap ila <locator>" option, where <locator> is the
      value to set in destination locator of the packet. e.g.
      
      ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
            encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0
      
      Sets a route where 3333:0:0:1 will be overwritten by
      2001:0:0:1 on output.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d7ab8d
    • Tom Herbert's avatar
      net: Add inet_proto_csum_replace_by_diff utility function · abc5d1ff
      Tom Herbert authored
      
      
      This function updates a checksum field value and skb->csum based on
      a value which is the difference between the old and new checksum.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abc5d1ff
    • Tom Herbert's avatar
      net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool · 4b048d6d
      Tom Herbert authored
      
      
      inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
      the checksum field carries a pseudo header. This argument should be a
      boolean instead of an int.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b048d6d
    • Tom Herbert's avatar
      lwt: Add support to redirect dst.input · 25368623
      Tom Herbert authored
      
      
      This patch adds the capability to redirect dst input in the same way
      that dst output is redirected by LWT.
      
      Also, save the original dst.input and and dst.out when setting up
      lwtunnel redirection. These can be called by the client as a pass-
      through.
      
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25368623
    • David S. Miller's avatar
      enic: Fix sparse warning in vnic_devcmd_init(). · f376d4ad
      David S. Miller authored
      
      
      >> drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: sparse: incorrect type in assignment (different address spaces)
         drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    expected void *res
         drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    got void [noderef] <asn:2>*
      
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f376d4ad
    • David S. Miller's avatar
      mlx5e: Fix sparse warnings in mlx5e_handle_csum(). · ecf842f6
      David S. Miller authored
      
      
      >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44: sparse: incorrect type in argument 1 (different base types)
         drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    expected restricted __sum16 [usertype] n
         drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    got restricted __be16 [usertype] check_sum
      
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecf842f6
    • David Ahern's avatar
      inet: Move VRF table lookup to inlined function · dc028da5
      David Ahern authored
      
      
      Table lookup compiles out when VRF is not enabled.
      
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc028da5
    • David Ahern's avatar
      net: Fix docbook warning for IFF_VRF_MASTER enum · 808d28c4
      David Ahern authored
      kbuild test robot reported:
      tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
      head:   d52736e2
      commit: 4e3c8992
      
       [751/762] net: Introduce VRF related flags and helpers
      reproduce: make htmldocs
      
      >> Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags'
      
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      808d28c4
    • David Ahern's avatar
      net: Updates to netif_index_is_vrf · 2f52bdcf
      David Ahern authored
      
      
      As Eric noted netif_index_is_vrf is not called with rcu_read_lock held,
      so wrap the dev_get_by_index_rcu in rcu_read_lock and unlock.
      
      If VRF is not enabled or oif is 0 skip the device lookup. In both cases
      index cannot be the VRF master.
      
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f52bdcf
    • David S. Miller's avatar
      Merge branch 'mlx5e-next' · 9cd3778c
      David S. Miller authored
      Achiad Shochat says:
      
      ====================
      Driver updates 16-Aug-2015
      
      This patchset contains bug fixes, new RSS and pause parameters ethtool
      options, and support for RX CHECKSUM_COMPLETE.
      
      Patchset was applied and tested over commit adc6310c
      
       ("Merge branch
      'mv88e6xxx-switchdev-fdb'").
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cd3778c
    • Achiad Shochat's avatar
      net/mlx5e: Support RX CHECKSUM_COMPLETE · bbceefce
      Achiad Shochat authored
      
      
      Only for packets with first ethertype set to IPv4/6 for now.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbceefce
    • Achiad Shochat's avatar
      net/mlx5e: Support ethtool get/set_pauseparam · 3c2d18ef
      Achiad Shochat authored
      
      
      Only rx/tx pause settings.
      Autoneg setting is currently not supported.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c2d18ef
    • Achiad Shochat's avatar
      net/mlx5e: Ethtool link speed setting fixes · 6fa1bcab
      Achiad Shochat authored
      
      
      - Port speed settings are applied by the device only upon
        port admin status transition from DOWN to UP.
        So we enforce this transition regardless of the port's
        current operation state (which may be occasionally DOWN if
        for example the network cable is disconnected).
      - Fix the PORT_UP/DOWN device interface enum
      - Set the local_port bit in the device PAOS register
      - EXPORT the PAOS (Port Administrative and Operational Status)
        register set/query access functions.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fa1bcab
    • Achiad Shochat's avatar
      net/mlx5e: HW LRO changes/fixes · d9a40271
      Achiad Shochat authored
      
      
      - Change the maximum LRO session size from 16KB to 64KB
      - Reduce the LRO session timeout from 512us to 32us in
        order to reduce the TCP latency of non-LRO'ed flows.
      - Fix skb_shinfo(skb)->gso_size and set skb_shinfo(skb)->gso_type.
      - Fix a bug accessing un-initialized mdev pointer.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9a40271
    • Achiad Shochat's avatar
      net/mlx5e: Support smaller RX/TX ring sizes · e842b100
      Achiad Shochat authored
      
      
      We un-intentionally limited the minimum rings size too much.
      
      TX minimum ring size reduced from 128 to 64.
      RX minimum ring size reduced from 128 to 2.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e842b100
    • Achiad Shochat's avatar
      net/mlx5e: Add ethtool RSS configuration options · 2d75b2bc
      Achiad Shochat authored
      
      
      - get_rxfh_key_size
      - get_rxfh_indir_size
      - get/set_rxfh indirection table and RSS Toeplitz hash key
      - get_rxnfc
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d75b2bc
    • Achiad Shochat's avatar
      net/mlx5e: Make RSS indirection table size a constant · 936896e9
      Achiad Shochat authored
      
      
      The indirection table size was defined by a variable that
      was actually assigned a constant value.
      Since we do not have any forseen intension to make it configurable
      we simply made it a constant.
      
      We also limit the number of channels such that the RSS indirection
      table could always populate all RX rings.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      936896e9
    • Achiad Shochat's avatar
      net/mlx5e: Have a single RSS Toeplitz hash key · 57afead5
      Achiad Shochat authored
      
      
      No need to generate a unique key per TIR.
      Generating a single key per netdev and copying it to all
      its TIRs.
      
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57afead5
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · 0aa65cc0
      David S. Miller authored
      
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2015-08-16
      
      Here's what's likely the last bluetooth-next pull request for 4.3:
      
       - 6lowpan/802.15.4 refactoring, cleanups & fixes
       - Document 6lowpan netdev usage in Documentation/networking/6lowpan.txt
       - Support for UART based QCA Bluetooth controllers
       - Power management support for Broeadcom Bluetooth controllers
       - Change LE connection initiation to always use passive scanning first
       - Support for new Silicon Wave USB ID
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0aa65cc0
    • David S. Miller's avatar
      Merge branch 'enic-devcmd2' · 863960b4
      David S. Miller authored
      
      
      Govindarajulu Varadarajan says:
      
      ====================
      enic: add devcmd2
      
      This series adds new devcmd2 support. The first two patches are code
      refactoring.
      
      devcmd is an interface for driver to communicate with fw/adaptor. It
      involves writing data to hardware registers and waiting for the result.
      This mechanism does not scale well. The queuing of "no wait" devcmds is
      done in firmware memory rather than on the host. Firmware memory is a
      rather more scarce and valuable resource than host memory. A devcmd storm
      from one vf can disrupt the service on other pf/vf. The lack of flow
      control allows for possible denial of server from one VM to another.
      Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
      allows better flow control.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      863960b4
    • Govindarajulu Varadarajan's avatar
      enic: add devcmd2 · 373fb087
      Govindarajulu Varadarajan authored
      
      
      devcmd is an interface for driver to communicate with fw/adaptor. It
      involves writing data to hardware registers and waiting for the result.
      This mechanism does not scale well. The queuing of "no wait" devcmds is
      done in firmware memory rather than on the host. Firmware memory is a
      rather more scarce and valuable resource than host memory. A devcmd storm
      from one vf can disrupt the service on other pf/vf. The lack of flow
      control allows for possible denial of server from one VM to another.
      
      Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
      allows better flow control.
      
      Initialize devcmd2, if fails we fall back to devcmd1.
      
      Also change the driver version.
      
      Signed-off-by: default avatarN V V Satyanarayana Reddy <nalreddy@cisco.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      373fb087
    • Govindarajulu Varadarajan's avatar
      enic: add devcmd2 resources · fda3f52b
      Govindarajulu Varadarajan authored
      
      
      Add devcmd resources to vnic_res_type. Add data types used by devcmd.
      
      Signed-off-by: default avatarN V V Satyanarayana Reddy <nalreddy@cisco.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fda3f52b
    • Govindarajulu Varadarajan's avatar
      enic: use netdev_<foo> or dev_<foo> instead of pr_<foo> · 6a3c2f83
      Govindarajulu Varadarajan authored
      
      
      pr_info does not give any details about the interface involved. This patch
      uses netdev_info for printing the message. Use dev_info where netdev is not
      ready.
      
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a3c2f83
    • Govindarajulu Varadarajan's avatar
      enic: move struct definition from .c to .h file · 8b89f3a1
      Govindarajulu Varadarajan authored
      
      
      Some of the structure definitions are in .c file to make them private to
      that file. This patch moves the struct definition to .h file, So that their
      definitions are accessible from other files.
      
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b89f3a1
    • David S. Miller's avatar
      2ea273d7
    • Ian Morris's avatar
      ipv6: trivial whitespace fix · ec120da6
      Ian Morris authored
      
      
      Change brace placement to be in line with coding standards
      
      Signed-off-by: default avatarIan Morris <ipm@chirality.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec120da6
    • Phil Sutter's avatar
      rhashtable-test: extend to test concurrency · f4a3e90b
      Phil Sutter authored
      
      
      After having tested insertion, lookup, table walk and removal, spawn a
      number of threads running operations on the same rhashtable. Each of
      them will:
      
      1) insert it's own set of objects,
      2) lookup every successfully inserted object and finally
      3) remove objects in several rounds until all of them have been removed,
         making sure the remaining ones are still found after each round.
      
      This should put a good amount of load onto the system and due to
      synchronising thread startup via two semaphores also extensive
      concurrent table access.
      
      The default number of ten threads returned within half a second on my
      local VM with two cores. Running 200 threads took about four seconds. If
      slow systems suffer too much from this though, the default could be
      lowered or even set to zero so this extended test does not run at all by
      default.
      
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4a3e90b
    • David S. Miller's avatar
      Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · c1f066d4
      David S. Miller authored
      
      
      Antonio Quartulli says:
      
      ====================
      Included changes:
      - avoid integer overflow in GW selection routine
      - prevent race condition by making capability bit changes atomic (use
        clear/set/test_bit)
      - fix synchronization issue in mcast tvlv handler
      - fix crash on double list removal of TT Request objects
      - fix leak by puring packets enqueued for sending upon iface removal
      - ensure network header pointer is set in skb
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1f066d4
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-davem-2015-08-14' of... · 2bd736fa
      David S. Miller authored
      
      Merge tag 'mac80211-next-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      Another pull request for the next cycle, this time with quite
      a bit of content:
       * mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
       * TDLS higher bandwidth support (Arik)
       * OCB fixes from Bertold Van den Bergh
       * suspend/resume fixes from Eliad
       * dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
       * VHT bitrate mask support (Lorenzo Bianconi)
       * better regulatory support for 5/10 MHz channels (Matthias May)
       * basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
      along with a number of other cleanups.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bd736fa
    • David S. Miller's avatar
      Merge branch 'bpf_fanout' · 90eb7fa5
      David S. Miller authored
      
      
      Willem de Bruijn says:
      
      ====================
      packet: add cBPF and eBPF fanout modes
      
      Allow programmable fanout modes. Support both classical BPF programs
      passed directly and extended BPF programs passed by file descriptor.
      
      One use case is packet steering by deep packet inspection, for
      instance for packet steering by application layer header fields.
      
      Separate the configuration of the fanout mode and the configuration
      of the program, to allow dynamic updates to the latter at runtime.
      
      Changes
        v1 -> v2:
          - follow SO_LOCK_FILTER semantics on filter updates
          - only accept eBPF programs of type BPF_PROG_TYPE_SOCKET_FILTER
          - rename PACKET_FANOUT_BPF to PACKET_FANOUT_CBPF to match
            man 2 bpf usage: "classic" vs. "extended" BPF.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90eb7fa5
    • Willem de Bruijn's avatar
      selftests/net: test extended BPF fanout mode · 30da679e
      Willem de Bruijn authored
      
      
      Test PACKET_FANOUT_EBPF by inserting a program into the the kernel
      with bpf(), then attaching it to the fanout group. Observe the same
      payload-based distribution as in the PACKET_FANOUT_CBPF test.
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30da679e
    • Willem de Bruijn's avatar
      selftests/net: test classic bpf fanout mode · 95e22792
      Willem de Bruijn authored
      
      
      Test PACKET_FANOUT_CBPF by inserting a cBPF program that selects a
      socket by payload. Requires modifying the test program to send
      packets with multiple payloads.
      
      Also fix a bug in testing the return value of mmap()
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95e22792
    • Willem de Bruijn's avatar
      packet: add extended BPF fanout mode · f2e52095
      Willem de Bruijn authored
      
      
      Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF
      program to select a socket.
      
      Update the internal eBPF program by passing to socket option
      SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2e52095
    • Willem de Bruijn's avatar
      packet: add classic BPF fanout mode · 47dceb8e
      Willem de Bruijn authored
      
      
      Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program
      to select a socket.
      
      This avoids having to keep adding special case fanout modes. One
      example use case is application layer load balancing. The QUIC
      protocol, for instance, encodes a connection ID in UDP payload.
      
      Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data
      associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the
      only user so far.
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47dceb8e
    • Jiri Benc's avatar
      lwtunnel: rename ip lwtunnel attributes · a1c234f9
      Jiri Benc authored
      We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
      very similar, yet they serve very different purpose. This is confusing for
      anyone trying to implement a user space tool supporting lwt.
      
      As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
      them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
      logical to have them in lwtunnel.h together with the encap enum.
      
      Fixes: 3093fbe7
      
       ("route: Per route IP tunnel metadata via lightweight tunnel")
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1c234f9
    • Guenter Roeck's avatar
      smsc911x: Fix crash seen if neither ACPI nor OF is configured or used · 62ee783b
      Guenter Roeck authored
      Commit 0b50dc4f ("Convert smsc911x to use ACPI as well as DT") makes
      the call to smsc911x_probe_config() unconditional, and no longer fails if
      there is no device node. device_get_phy_mode() is called unconditionally,
      and if there is no phy node configured returns an error code. This error
      code is assigned to phy_interface, and interpreted elsewhere in the code
      as valid phy mode. This in turn causes qemu to crash when running a
      variant of realview_pb_defconfig.
      
      	qemu: hardware error: lan9118_read: Bad reg 0x86
      
      Fixes: 0b50dc4f
      
       ("Convert smsc911x to use ACPI as well as DT")
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Cc Graeme Gregory <graeme.gregory@linaro.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62ee783b
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · c87acb25
      David S. Miller authored
      
      
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2015-08-17
      
      1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
         From Thomas Egerer.
      
      2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
         From Andrzej Hajda.
      
      3) Pass oif to the xfrm lookups so that it gets set on the flow
         and the resolver routines can match based on oif.
         From David Ahern.
      
      4) Add documentation for the new xfrm garbage collector threshold.
         From Alexander Duyck.
      
      Please pull or let me know if there are problems.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c87acb25
    • Jesse Brandeburg's avatar
      net: fix endian check warning in etherdevice.h · fbaff3ef
      Jesse Brandeburg authored
      Sparse builds have been warning for a really long time now
      that etherdevice.h has a conversion that is unsafe.
      
        include/linux/etherdevice.h:79:32: warning: restricted __be16 degrades to integer
      
      This code change fixes the issue and generates the exact
      same assembly before/after (checked on x86_64)
      
      Fixes: 2c722fe1
      
       (etherdevice: Optimize a few is_<foo>_ether_addr functions)
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      CC: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbaff3ef
    • David S. Miller's avatar
      Merge branch 'iff_no_queue' · f3ae683f
      David S. Miller authored
      
      
      Phil Sutter says:
      
      ====================
      net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len
      
      This series adds a new private net_device flag indicating that a device may
      (and probably should) be used without a queueing discipline attached to it.
      This is already common practice for many virtual device types like e.g.
      loopback, VLAN (802.1Q) or bridges (802.1D). The reason for this is that these
      devices lack an underlying layer which could impose back pressure and therefore
      making a TX queue necessary to not slow down senders.
      
      Up to now, drivers being aware of the above applying to them set
      dev->tx_queue_len to zero to indicate no qdisc should be attached to the
      interface they drive and the kernel reacts upon this by assigning the noop
      qdisc instead of the default pfifo_fast. This implicit agreement though leads
      to an inconvenient situation once a user tries to attach a real qdisc to these
      devices, as the formerly special tx_queue_len value becomes a regular one,
      limiting the queue to zero packets and thus prevents any TX from happening. To
      overcome this, practically all qdisc implementations intercept and sanitize the
      malicious value.
      
      With this series applied, drivers may signal the lack of need for a qdisc
      without having to tamper with tx_queue_len, making fallbacks in qdiscs and
      caveats in userspace unnecessary.
      
      Upon upstream acceptance, this series will be followed up by a set of patches
      converting device drivers, adding a warning so out-of-tree driver authors get
      aware of this change and dropping all special handling of tx_queue_len in
      net/sched/.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3ae683f