Skip to content
  1. Nov 01, 2012
    • Pavel Emelyanov's avatar
      sk-filter: Add ability to get socket filter program (v2) · a8fc9277
      Pavel Emelyanov authored
      
      
      The SO_ATTACH_FILTER option is set only. I propose to add the get
      ability by using SO_ATTACH_FILTER in getsockopt. To be less
      irritating to eyes the SO_GET_FILTER alias to it is declared. This
      ability is required by checkpoint-restore project to be able to
      save full state of a socket.
      
      There are two issues with getting filter back.
      
      First, kernel modifies the sock_filter->code on filter load, thus in
      order to return the filter element back to user we have to decode it
      into user-visible constants. Fortunately the modification in question
      is interconvertible.
      
      Second, the BPF_S_ALU_DIV_K code modifies the command argument k to
      speed up the run-time division by doing kernel_k = reciprocal(user_k).
      Bad news is that different user_k may result in same kernel_k, so we
      can't get the original user_k back. Good news is that we don't have
      to do it. What we need to is calculate a user2_k so, that
      
        reciprocal(user2_k) == reciprocal(user_k) == kernel_k
      
      i.e. if it's re-loaded back the compiled again value will be exactly
      the same as it was. That said, the user2_k can be calculated like this
      
        user2_k = reciprocal(kernel_k)
      
      with an exception, that if kernel_k == 0, then user2_k == 1.
      
      The optlen argument is treated like this -- when zero, kernel returns
      the amount of sock_fprog elements in filter, otherwise it should be
      large enough for the sock_fprog array.
      
      changes since v1:
      * Declared SO_GET_FILTER in all arch headers
      * Added decode of vlan-tag codes
      
      Signed-off-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8fc9277
    • Jason Wang's avatar
      tuntap: choose the txq based on rxq · 96442e42
      Jason Wang authored
      
      
      This patch implements a simple multiqueue flow steering policy - tx follows rx
      for tun/tap. The idea is simple, it just choose the txq based on which rxq it
      comes. The flow were identified through the rxhash of a skb, and the hash to
      queue mapping were recorded in a hlist with an ageing timer to retire the
      mapping. The mapping were created when tun receives packet from userspace, and
      was quired in .ndo_select_queue().
      
      I run co-current TCP_CRR test and didn't see any mapping manipulation helpers in
      perf top, so the overhead could be negelected.
      
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96442e42
    • Jason Wang's avatar
      tuntap: add ioctl to attach or detach a file form tuntap device · cde8b15f
      Jason Wang authored
      
      
      Sometimes usespace may need to active/deactive a queue, this could be done by
      detaching and attaching a file from tuntap device.
      
      This patch introduces a new ioctls - TUNSETQUEUE which could be used to do
      this. Flag IFF_ATTACH_QUEUE were introduced to do attaching while
      IFF_DETACH_QUEUE were introduced to do the detaching.
      
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cde8b15f
    • Jason Wang's avatar
      tuntap: multiqueue support · c8d68e6b
      Jason Wang authored
      
      
      This patch converts tun/tap to a multiqueue devices and expose the multiqueue
      queues as multiple file descriptors to userspace. Internally, each tun_file were
      abstracted as a queue, and an array of pointers to tun_file structurs were
      stored in tun_structure device, so multiple tun_files were allowed to be
      attached to the device as multiple queues.
      
      When choosing txq, we first try to identify a flow through its rxhash, if it
      does not have such one, we could try recorded rxq and then use them to choose
      the transmit queue. This policy may be changed in the future.
      
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8d68e6b
    • Jason Wang's avatar
      tuntap: introduce multiqueue flags · bbb00994
      Jason Wang authored
      
      
      Add flags to be used by creating multiqueue tuntap device.
      
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbb00994
    • Jason Wang's avatar
      tuntap: RCUify dereferencing between tun_struct and tun_file · 6e914fc7
      Jason Wang authored
      
      
      RCU were introduced in this patch to synchronize the dereferences between
      tun_struct and tun_file. All tun_{get|put} were replaced with RCU, the
      dereference from one to other must be done under rtnl lock or rcu read critical
      region.
      
      This is needed for the following patches since the one of the goal of multiqueue
      tuntap is to allow adding or removing queues during workload. Without RCU,
      control path would hold tx locks when adding or removing queues (which may cause
      sme delay) and it's hard to change the number of queues without stopping the net
      device. With the help of rcu, there's also no need for tun_file hold an refcnt
      to tun_struct.
      
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e914fc7
    • Jason Wang's avatar
      tuntap: move socket to tun_file · 54f968d6
      Jason Wang authored
      
      
      Current tuntap makes use of the socket receive queue as its tx queue. To
      implement multiple tx queues for tuntap and enable the ability of adding and
      removing queues during workload, the first step is to move the socket related
      structures to tun_file. Then we could let multiple fds/sockets to be attached to
      the tuntap.
      
      This patch removes tun_sock and moves socket related structures from tun_sock or
      tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are
      still kept in tun_structure since they are used to filter packets for the net
      device instead of per transmit queue (at least I see no requirements for
      them). After those changes, socket were created and destroyed during file open
      and close (instead of device creation and destroy), the socket structures could
      be dereferenced from tun_file instead of the file of tun_struct structure
      itself.
      
      For persisent device, since we purge during datching and wouldn't queue any
      packets when no interface were attached, there's no behaviod changes before and
      after this patch, so the changes were transparent to the userspace. To keep the
      attributes such as sndbuf, socket filter and vnet header, those would be
      re-initialize after a new interface were attached to an persist device.
      
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54f968d6
    • Jason Wang's avatar
      1e588338
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 810b6d76
      David S. Miller authored
      
      
      Jeff Kirsher says:
      
      ====================
      This series contains updates to ixgbe, ixgbevf, igbvf, igb and
      networking core (bridge).  Most notably is the addition of support
      for local link multicast addresses in SR-IOV mode to the networking
      core.
      
      Also note, the ixgbe patch "ixgbe: Add support for pipeline reset" and
      "ixgbe: Fix return value from macvlan filter function" is revised based
      on community feedback.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      810b6d76
    • Joe Perches's avatar
      ethernet: Convert dev_printk(KERN_<LEVEL> to dev_<level>( · f7b4fb22
      Joe Perches authored
      
      
      dev_<level> calls take less code than dev_printk(KERN_<LEVEL>
      and reducing object size is good.
      Coalesce formats for easier grep.
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7b4fb22
    • Eric Dumazet's avatar
      x86: bpf_jit_comp: add vlan tag support · 855ddb56
      Eric Dumazet authored
      
      
      This patch is a follow-up for patch "net: filter: add vlan tag access"
      to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ani Sinha <ani@aristanetworks.com>
      Cc: Daniel Borkmann <danborkmann@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      855ddb56
    • Eric Dumazet's avatar
      net: filter: add vlan tag access · f3335031
      Eric Dumazet authored
      
      
      BPF filters lack ability to access skb->vlan_tci
      
      This patch adds two new ancillary accessors :
      
      SKF_AD_VLAN_TAG         (44) mapped to vlan_tx_tag_get(skb)
      
      SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)
      
      This allows libpcap/tcpdump to use a kernel filter instead of
      having to fallback to accept all packets, then filter them in
      user space.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarAni Sinha <ani@aristanetworks.com>
      Suggested-by: default avatarDaniel Borkmann <danborkmann@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3335031
    • Joachim Eastwood's avatar
      net/cadence: depend on HAS_IOMEM · 0f6ae8f1
      Joachim Eastwood authored
      
      
      Fixes the following build failure on S390:
        In file included from drivers/net/ethernet/cadence/at91_ether.c:35:0:
         drivers/net/ethernet/cadence/macb.h: In function 'macb_is_gem':
         drivers/net/ethernet/cadence/macb.h:563:2: error: implicit declaration of function '__raw_readl' [-Werror=implicit-function-declaration]
         drivers/net/ethernet/cadence/at91_ether.c: In function 'update_mac_address':
         drivers/net/ethernet/cadence/at91_ether.c:119:2: error: implicit declaration of function '__raw_writel' [-Werror=implicit-function-declaration]
         cc1: some warnings being treated as errors
      
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f6ae8f1
    • Flavio Leitner's avatar
      netxen: explicity handle pause autoneg parameter · 15111025
      Flavio Leitner authored
      
      
      The hardware doesn't support controlling pause frames autoneg, so
      report that back correctly to userspace.
      
      Signed-off-by: default avatarFlavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15111025
    • stephen hemminger's avatar
      tcp: make tcp_clear_md5_list static · e0683e70
      stephen hemminger authored
      
      
      Trivial. Only used in one file.
      
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0683e70
    • Willem de Bruijn's avatar
      net: compute skb->rxhash if nic hash may be 3-tuple · ecd5cf5d
      Willem de Bruijn authored
      
      
      Network device drivers can communicate a Toeplitz hash in skb->rxhash,
      but devices differ in their hashing capabilities. All compute a 5-tuple
      hash for TCP over IPv4, but for other connection-oriented protocols,
      they may compute only a 3-tuple. This breaks RPS load balancing, e.g.,
      for TCP over IPv6 flows. Additionally, for GRE and other tunnels,
      the kernel computes a 5-tuple hash over the inner packet if possible,
      but devices do not.
      
      This patch recomputes the rxhash in software in all cases where it
      cannot be certain that a 5-tuple was computed. Device drivers can avoid
      recomputation by setting the skb->l4_rxhash flag.
      
      Recomputing adds cycles to each packet when RPS is enabled or the
      packet arrives over a tunnel. A comparison of 200x TCP_STREAM between
      two servers running unmodified netnext with rxhash computation
      in hardware vs software (using ethtool -K eth0 rxhash [on|off]) shows
      how much time is spent in __skb_get_rxhash in this worst case:
      
           0.03%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
           0.03%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
           0.05%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
      
      With 200x TCP_RR it increases to
      
           0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
           0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
           0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
      
      I considered having the patch explicitly skips recomputation when it knows
      that it will not improve the hash (TCP over IPv4), but that conditional
      complicates code without saving many cycles in practice, because it has
      to take place after flow dissector.
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecd5cf5d
    • Devendra Naga's avatar
      dlink: dl2k: use the module_pci_driver macro · 9add4d81
      Devendra Naga authored
      
      
      use the module_pci_driver macro to make the code simpler
      by eliminating module_init and module_exit calls.
      
      Signed-off-by: default avatarDevendra Naga <devendra.aaru@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9add4d81
    • Devendra Naga's avatar
      realtek: r8169: use module_pci_driver macro · 3eeb7da9
      Devendra Naga authored
      
      
      use the module_pci_driver macro to make the code simpler
      by eliminating the module_init and module_exit calls
      
      Signed-off-by: default avatarDevendra Naga <devendra.aaru@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3eeb7da9
    • David S. Miller's avatar
      Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · 15708381
      David S. Miller authored
      included changes:
      - some code cleanups and minor fixes (3 of them were reported by Coverity)
      - 'struct hard_iface' re-shaping to improve multi-protocol support
      - ECTP packets silent drop
      - transfer the WIFI flag on clients in case of roaming
      15708381
    • Wei Yongjun's avatar
      qla3xxx: remove unused variable in ql_process_mac_tx_intr() · 1627801d
      Wei Yongjun authored
      The variable retval is initialized but never used
      otherwise, so remove the unused variable.
      
      dpatch engine is used to auto generate this patch.
      (https://github.com/weiyj/dpatch
      
      )
      
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Acked-by: default avatarJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1627801d
    • Wei Yongjun's avatar
      qla3xxx: use module_pci_driver to simplify the code · 680d8669
      Wei Yongjun authored
      Use the module_pci_driver() macro to make the code simpler
      by eliminating module_init and module_exit calls.
      
      dpatch engine is used to auto generate this patch.
      (https://github.com/weiyj/dpatch
      
      )
      
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Acked-by: default avatarJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      680d8669
    • Steve Glendinning's avatar
      smsc95xx: add wol support for more frame types · bbd9f9ee
      Steve Glendinning authored
      
      
      This patch adds support for wol wakeup on unicast, broadcast,
      multicast and arp frames.
      
      The wakeup filter code isn't pretty, but it works.
      
      Signed-off-by: default avatarSteve Glendinning <steve.glendinning@shawell.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbd9f9ee
    • Claudio Fontana's avatar
      net/ipv4/ipconfig: add device address to a KERN_INFO message · 9ecd1c3d
      Claudio Fontana authored
      
      
      adds a "hwaddr" to the "IP-Config: Complete" KERN_INFO message
      with the dev_addr of the device selected for auto configuration.
      
      Signed-off-by: default avatarClaudio Fontana <claudio.fontana@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ecd1c3d
    • John Fastabend's avatar
      ixgbe: add setlink, getlink support to ixgbe and ixgbevf · 815cccbf
      John Fastabend authored
      
      
      This adds support for the net device ops to manage the embedded
      hardware bridge on ixgbe devices. With this patch the bridge
      mode can be toggled between VEB and VEPA to support stacking
      macvlan devices or using the embedded switch without any SW
      component in 802.1Qbg/br environments.
      
      Additionally, this adds source address pruning to the ixgbevf
      driver to prune any frames sent back from a reflective relay on
      the switch. This is required because the existing hardware does
      not support this. Without it frames get pushed into the stack
      with its own src mac which is invalid per 802.1Qbg VEPA
      definition.
      
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      815cccbf
    • John Fastabend's avatar
      net: set and query VEB/VEPA bridge mode via PF_BRIDGE · 2469ffd7
      John Fastabend authored
      
      
      Hardware switches may support enabling and disabling the
      loopback switch which puts the device in a VEPA mode defined
      in the IEEE 802.1Qbg specification. In this mode frames are
      not switched in the hardware but sent directly to the switch.
      SR-IOV capable NICs will likely support this mode I am
      aware of at least two such devices. Also I am told (but don't
      have any of this hardware available) that there are devices
      that only support VEPA modes. In these cases it is important
      at a minimum to be able to query these attributes.
      
      This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
      set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
      anticipating bridge attributes that may be common for both embedded
      bridges and software bridges this adds a flags attribute
      IFLA_BRIDGE_FLAGS currently used to determine if the command or event
      is being generated to/from an embedded bridge or software bridge.
      Finally, the event generation is pulled out of the bridge module and
      into rtnetlink proper.
      
      For example using the macvlan driver in VEPA mode on top of
      an embedded switch requires putting the embedded switch into
      a VEPA mode to get the expected results.
      
      	--------  --------
              | VEPA |  | VEPA |       <-- macvlan vepa edge relays
              --------  --------
                 |        |
                 |        |
              ------------------
              |      VEPA      |       <-- embedded switch in NIC
              ------------------
                      |
                      |
              -------------------
              | external switch |      <-- shiny new physical
      	-------------------          switch with VEPA support
      
      A packet sent from the macvlan VEPA at the top could be
      loopbacked on the embedded switch and never seen by the
      external switch. So in order for this to work the embedded
      switch needs to be set in the VEPA state via the above
      described commands.
      
      By making these attributes nested in IFLA_AF_SPEC we allow
      future extensions to be made as needed.
      
      CC: Lennert Buytenhek <buytenh@wantstofly.org>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2469ffd7
    • John Fastabend's avatar
      net: create generic bridge ops · e5a55a89
      John Fastabend authored
      
      
      The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
      currently embedded in the ./net/bridge module. This prohibits
      them from being used by other bridging devices. One example
      of this being hardware that has embedded bridging components.
      
      In order to use these nlmsg types more generically this patch
      adds two net_device_ops hooks. One to set link bridge attributes
      and another to dump the current bride attributes.
      
      	ndo_bridge_setlink()
      	ndo_bridge_getlink()
      
      CC: Lennert Buytenhek <buytenh@wantstofly.org>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5a55a89
  2. Oct 30, 2012
  3. Oct 29, 2012