Skip to content
  1. Nov 22, 2016
    • pravin shelar's avatar
      geneve: Unify LWT and netdev handling. · 9b4437a5
      pravin shelar authored
      
      
      Current geneve implementation has two separate cases to handle.
      1. netdev xmit
      2. LWT xmit.
      
      In case of netdev, geneve configuration is stored in various
      struct geneve_dev members. For example geneve_addr, ttl, tos,
      label, flags, dst_cache, etc. For LWT ip_tunnel_info is passed
      to the device in ip_tunnel_info.
      
      Following patch uses ip_tunnel_info struct to store almost all
      of configuration of a geneve netdevice. This allows us to unify
      most of geneve driver code around ip_tunnel_info struct.
      This dramatically simplify geneve code, since it does not
      need to handle two different configuration cases. Removes
      duplicate code, single code path can handle either type
      of geneve devices.
      
      Signed-off-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b4437a5
    • David S. Miller's avatar
      Merge branch 'tcp-cong-undo_cwnd-mandatory' · 9e36ced6
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      tcp: make undo_cwnd mandatory for congestion modules
      
      highspeed, illinois, scalable, veno and yeah congestion control algorithms
      don't provide a 'cwnd_undo' function.  This makes the stack default to a
      'reno undo' which doubles cwnd.  However, the ssthresh implementation of
      these algorithms do not halve the slowstart threshold. This causes similar
      issue as the one fixed for dctcp in ce6dd233
      
       ("dctcp: avoid bogus
      doubling of cwnd after loss").
      
      In light of this it seems better to remove the fallback and make undo_cwnd
      mandatory.
      
      First patch fixes those spots where reno undo seems incorrect by providing
      .cwnd_undo functions, second patch removes the fallback.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e36ced6
    • Florian Westphal's avatar
      tcp: make undo_cwnd mandatory for congestion modules · e9799183
      Florian Westphal authored
      
      
      The undo_cwnd fallback in the stack doubles cwnd based on ssthresh,
      which un-does reno halving behaviour.
      
      It seems more appropriate to let congctl algorithms pair .ssthresh
      and .undo_cwnd properly. Add a 'tcp_reno_undo_cwnd' function and wire it
      up for all congestion algorithms that used to rely on the fallback.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9799183
    • Florian Westphal's avatar
      tcp: add cwnd_undo functions to various tcp cc algorithms · 85f7e750
      Florian Westphal authored
      
      
      congestion control algorithms that do not halve cwnd in their .ssthresh
      should provide a .cwnd_undo rather than rely on current fallback which
      assumes reno halving (and thus doubles the cwnd).
      
      All of these do 'something else' in their .ssthresh implementation, thus
      store the cwnd on loss and provide .undo_cwnd to restore it again.
      
      A followup patch will remove the fallback and all algorithms will
      need to provide a .cwnd_undo function.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85f7e750
    • David S. Miller's avatar
      Merge branch 'bridge-igmpv3-mldv2-support' · 2fcb58ab
      David S. Miller authored
      
      
      Nikolay Aleksandrov says:
      
      ====================
      bridge: add support for IGMPv3 and MLDv2 querier
      
      This patch-set adds support for IGMPv3 and MLDv2 querier in the bridge.
      Two new options which can be toggled via netlink and sysfs are added that
      control the version per-bridge:
       multicast_igmp_version - default 2, can be set to 3
       multicast_mld_version - default 1, can be set to 2 (this option is
                               disabled if CONFIG_IPV6=n)
      
      Note that the names do not include "querier", I think that these options
      can be re-used later as more IGMPv3 support is added to the bridge so we
      can avoid adding more options to switch between v2 and v3 behaviour.
      
      The set uses the already existing br_ip{4,6}_multicast_alloc_query
      functions and adds the appropriate header based on the chosen version.
      
      For the initial support I have removed the compatibility implementation
      (RFC3376 sec 7.3.1, 7.3.2; RFC3810 sec 8.3.1, 8.3.2), because there are
      some details that we need to sort out.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fcb58ab
    • Nikolay Aleksandrov's avatar
      bridge: mcast: add MLDv2 querier support · aa2ae3e7
      Nikolay Aleksandrov authored
      
      
      This patch adds basic support for MLDv2 queries, the default is MLDv1
      as before. A new multicast option - multicast_mld_version, adds the
      ability to change it between 1 and 2 via netlink and sysfs.
      The MLD option is disabled if CONFIG_IPV6 is disabled.
      
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa2ae3e7
    • Nikolay Aleksandrov's avatar
      bridge: mcast: add IGMPv3 query support · 5e923585
      Nikolay Aleksandrov authored
      
      
      This patch adds basic support for IGMPv3 queries, the default is IGMPv2
      as before. A new multicast option - multicast_igmp_version, adds the
      ability to change it between 2 and 3 via netlink and sysfs. The option
      struct member is in a 4 byte hole in net_bridge.
      
      There also a few minor style adjustments in br_multicast_new_group and
      br_multicast_add_group.
      
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e923585
    • Gao Feng's avatar
      driver: macvlan: Remove duplicated IFF_UP condition check in macvlan_forward_source · fc51f2b7
      Gao Feng authored
      
      
      The function macvlan_forward_source_one has already checked the flag
      IFF_UP, so needn't check it outside in macvlan_forward_source too.
      
      Signed-off-by: default avatarGao Feng <gfree.wind@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc51f2b7
    • Eric Dumazet's avatar
      mlx4: avoid unnecessary dirtying of critical fields · dad42c30
      Eric Dumazet authored
      
      
      While stressing a 40Gbit mlx4 NIC with busy polling, I found false
      sharing in mlx4 driver that can be easily avoided.
      
      This patch brings an additional 7 % performance improvement in UDP_RR
      workload.
      
      1) If we received no frame during one mlx4_en_process_rx_cq()
         invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons
      
      2) Do not refill rx buffers if we have plenty of them.
         This avoids false sharing and allows some bulk/batch optimizations.
         Page allocator and its locks will thank us.
      
      Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
      cpu handling NIC IRQ should be changed. We should return budget-1
      instead, to not fool net_rx_action() and its netdev_budget.
      
      v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dad42c30
    • Eric Dumazet's avatar
      bnx2: use READ_ONCE() instead of barrier() · b668534c
      Eric Dumazet authored
      
      
      barrier() is a big hammer compared to READ_ONCE(),
      and requires comments explaining what is protected.
      
      READ_ONCE() is more precise and compiler should generate
      better overall code.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b668534c
    • Eric Dumazet's avatar
      udp: avoid one cache line miss in recvmsg() · d21dbdfe
      Eric Dumazet authored
      
      
      UDP_SKB_CB(skb)->partial_cov is located at offset 66 in skb,
      requesting a cold cache line being read in cpu cache.
      
      We can avoid this cache line miss for UDP sockets,
      as partial_cov has a meaning only for UDPLite.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d21dbdfe
    • David S. Miller's avatar
      Merge branch 'mlx5-bpf-refcnt-fixes' · ee9d5461
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      Couple of BPF refcount fixes for mlx5
      
      Various mlx5 bugs on eBPF refcount handling found during review.
      Last patch in series adds a __must_check to BPF helpers to make
      sure we won't run into it again w/o compiler complaining first.
      
      v2 -> v3:
      
       - Just reworked patch 2/4 so we don't need bpf_prog_sub().
       - Rebased, rest as is.
      
      v1 -> v2:
      
       - After discussion with Alexei, we agreed upon rebasing the
         patches against net-next.
       - Since net-next, I've also added the __must_check to enforce
         future users to check for errors.
       - Fixed up commit message #2.
       - Simplify assignment from patch #1 based on Saeed's feedback
         on previous set.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee9d5461
    • Daniel Borkmann's avatar
      bpf: add __must_check attributes to refcount manipulating helpers · 6d67942d
      Daniel Borkmann authored
      
      
      Helpers like bpf_prog_add(), bpf_prog_inc(), bpf_map_inc() can fail
      with an error, so make sure the caller properly checks their return
      value and not just ignores it, which could worst-case lead to use
      after free.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d67942d
    • Daniel Borkmann's avatar
      bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup · a055c19b
      Daniel Borkmann authored
      mlx5e_xdp_set() is currently the only place where we drop reference on the
      prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
      need to make sure that we eventually release that reference, for example,
      in case the netdev is dismantled, otherwise we leak the program.
      
      Fixes: 86994156
      
       ("net/mlx5e: XDP fast RX drop bpf programs support")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a055c19b
    • Daniel Borkmann's avatar
      bpf, mlx5: fix various refcount issues in mlx5e_xdp_set · c54c0629
      Daniel Borkmann authored
      There are multiple issues in mlx5e_xdp_set():
      
      1) The batched bpf_prog_add() is currently not checked for errors. When
         doing so, it should be done at an earlier point in time to makes sure
         that we cannot fail anymore at the time we want to set the program for
         each channel. The batched refs short-cut can only be performed when we
         don't need to perform a reset for changing the rq type and the device
         was in opened state. In case the device was not in opened state, then
         the next mlx5e_open_locked() will aquire the refs from the control prog
         via mlx5e_create_rq(), same when we need to perform a reset.
      
      2) When swapping the priv->xdp_prog, then no extra reference count must be
         taken since we got that from call path via dev_change_xdp_fd() already.
         Otherwise, we'd never be able to release the program. Also, bpf_prog_add()
         without checking the return code could fail.
      
      Fixes: 86994156
      
       ("net/mlx5e: XDP fast RX drop bpf programs support")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c54c0629
    • Daniel Borkmann's avatar
      bpf, mlx5: fix mlx5e_create_rq taking reference on prog · 97bc402d
      Daniel Borkmann authored
      In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but
      without checking the return value. bpf_prog_add() can fail since 92117d84
      ("bpf: fix refcnt overflow"), so we really must check it. Take the reference
      right when we assign it to the rq from priv->xdp_prog, and just drop the
      reference on error path. Destruction in mlx5e_destroy_rq() looks good, though.
      
      Fixes: 86994156
      
       ("net/mlx5e: XDP fast RX drop bpf programs support")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97bc402d
  2. Nov 21, 2016
  3. Nov 20, 2016
    • Alexey Dobriyan's avatar
      net: fix bogus cast in skb_pagelen() and use unsigned variables · c72d8cda
      Alexey Dobriyan authored
      
      
      1) cast to "int" is unnecessary:
         u8 will be promoted to int before decrementing,
         small positive numbers fit into "int", so their values won't be changed
         during promotion.
      
         Once everything is int including loop counters, signedness doesn't
         matter: 32-bit operations will stay 32-bit operations.
      
         But! Someone tried to make this loop smart by making everything of
         the same type apparently in an attempt to optimise it.
         Do the optimization, just differently.
         Do the cast where it matters. :^)
      
      2) frag size is unsigned entity and sum of fragments sizes is also
         unsigned.
      
      Make everything unsigned, leave no MOVSX instruction behind.
      
      	add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-4 (-4)
      	function                                     old     new   delta
      	skb_cow_data                                 835     834      -1
      	ip_do_fragment                              2549    2548      -1
      	ip6_fragment                                3130    3128      -2
      	Total: Before=154865032, After=154865028, chg -0.00%
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c72d8cda
    • Alexey Dobriyan's avatar
      netlink: smaller nla_attr_minlen table · 32d84cdc
      Alexey Dobriyan authored
      
      
      Length of a netlink attribute may be u16 but lengths of basic attributes
      are much smaller, so small we can save 16 bytes of .rodata and pocket
      change inside .text.
      
      16-bit is worse on x86-64 than 8-bit because of operand size override prefix.
      
      	add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-19 (-19)
      	function                                     old     new   delta
      	validate_nla                                 418     417      -1
      	nla_policy_len                                66      64      -2
      	nla_attr_minlen                               32      16     -16
      	Total: Before=154865051, After=154865032, chg -0.00%
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32d84cdc
    • Alexey Dobriyan's avatar
      netlink: use "unsigned int" in nla_next() · 3b2c75d3
      Alexey Dobriyan authored
      
      
      ->nla_len is unsigned entity (it's length after all) and u16,
      thus it can't overflow when being aligned into int/unsigned int.
      
      (nlmsg_next has the same code, but I didn't yet convince myself
      it is correct to do so).
      
      There is pointer arithmetic in this function and offset being
      unsigned is better:
      
      	add/remove: 0/0 grow/shrink: 1/64 up/down: 5/-309 (-304)
      	function                                     old     new   delta
      	nl80211_set_wiphy                           1444    1449      +5
      	team_nl_cmd_options_set                      997     995      -2
      	tcf_em_tree_validate                         872     870      -2
      	switchdev_port_bridge_setlink                352     350      -2
      	switchdev_port_br_afspec                     312     310      -2
      	rtm_to_fib_config                            428     426      -2
      	qla4xxx_sysfs_ddb_set_param                 2193    2191      -2
      	qla4xxx_iface_set_param                     4470    4468      -2
      	ovs_nla_free_flow_actions                    152     150      -2
      	output_userspace                             518     516      -2
      		...
      	nl80211_set_reg                              654     649      -5
      	validate_scan_freqs                          148     142      -6
      	validate_linkmsg                             288     282      -6
      	nl80211_parse_connkeys                       489     483      -6
      	nlattr_set                                   231     224      -7
      	nf_tables_delsetelem                         267     260      -7
      	do_setlink                                  3416    3408      -8
      	netlbl_cipsov4_add_std                      1672    1659     -13
      	nl80211_parse_sched_scan                    2902    2888     -14
      	nl80211_trigger_scan                        1738    1720     -18
      	do_execute_actions                          2821    2738     -83
      	Total: Before=154865355, After=154865051, chg -0.00%
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b2c75d3
    • Alexey Dobriyan's avatar
      net: make struct napi_alloc_cache::skb_count unsigned int · e0d7924a
      Alexey Dobriyan authored
      
      
      size_t is way too much for an integer not exceeding 64.
      
      Space savings: 10 bytes!
      
      	add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-10 (-10)
      	function                                     old     new   delta
      	napi_consume_skb                             165     163      -2
      	__kfree_skb_flush                             56      53      -3
      	__kfree_skb_defer                             97      92      -5
      	Total: Before=154865639, After=154865629, chg -0.00%
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0d7924a
    • David S. Miller's avatar
      Merge tag 'batadv-next-for-davem-20161119' of git://git.open-mesh.org/linux-merge · f463c99b
      David S. Miller authored
      
      
      Simon Wunderlich says:
      
      ====================
      This feature patchset includes the following changes:
      
       - 6 patches adding functionality to detect a WiFi interface under
         other virtual interfaces, like VLANs. They introduce a cache for
         the detected the WiFi configuration to avoid RTNL locking in
         critical sections. Patches have been prepared by Marek Lindner
         and Sven Eckelmann
      
       - Enable automatic module loading for genl requests, by Sven Eckelmann
      
       - Fix a potential race condition on interface removal. This is not
         happening very often in practice, but requires bigger changes to fix,
         so we are sending this to net-next. By Linus Luessing
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f463c99b
  4. Nov 19, 2016