Skip to content
  1. Sep 21, 2017
  2. Sep 20, 2017
    • David S. Miller's avatar
      Merge branch 'net-speedup-netns-create-delete-time' · 8ca712c3
      David S. Miller authored
      
      
      Eric Dumazet says:
      
      ====================
      net: speedup netns create/delete time
      
      When rate of netns creation/deletion is high enough,
      we observe softlockups in cleanup_net() caused by huge list
      of netns and way too many rcu_barrier() calls.
      
      This patch series does some optimizations in kobject,
      and add batching to tunnels so that netns dismantles are
      less costly.
      
      IPv6 addrlabels also get a per netns list, and tcp_metrics
      also benefit from batch flushing.
      
      This gives me one order of magnitude gain.
      (~50 ms -> ~5 ms for one netns create/delete pair)
      
      Tested:
      
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch series :
      
      $ time ./add_del_unshare.sh
      net_namespace        116    258   5504    1    2 : tunables    8    4    0 : slabdata    116    258      0
      
      real	3m24.910s
      user	0m0.747s
      sys	0m43.162s
      
      After :
      $ time ./add_del_unshare.sh
      net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0
      
      real	0m22.117s
      user	0m0.728s
      sys	0m35.328s
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ca712c3
    • Eric Dumazet's avatar
      ipv4: speedup ipv6 tunnels dismantle · 64bc1781
      Eric Dumazet authored
      
      
      Implement exit_batch() method to dismantle more devices
      per round.
      
      (rtnl_lock() ...
       unregister_netdevice_many() ...
       rtnl_unlock())
      
      Tested:
      $ cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch :
      $ time ./add_del_unshare.sh
      net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0
      
      real    1m38.965s
      user    0m0.688s
      sys     0m37.017s
      
      After patch:
      $ time ./add_del_unshare.sh
      net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0
      
      real	0m22.117s
      user	0m0.728s
      sys	0m35.328s
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64bc1781
    • Eric Dumazet's avatar
      ipv6: speedup ipv6 tunnels dismantle · bb401cae
      Eric Dumazet authored
      
      
      Implement exit_batch() method to dismantle more devices
      per round.
      
      (rtnl_lock() ...
       unregister_netdevice_many() ...
       rtnl_unlock())
      
      Tested:
      $ cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch :
      $ time ./add_del_unshare.sh
      net_namespace        110    267   5504    1    2 : tunables    8    4    0 : slabdata    110    267      0
      
      real    3m25.292s
      user    0m0.644s
      sys     0m40.153s
      
      After patch:
      
      $ time ./add_del_unshare.sh
      net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0
      
      real	1m38.965s
      user	0m0.688s
      sys	0m37.017s
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb401cae
    • Eric Dumazet's avatar
      tcp: batch tcp_net_metrics_exit · 789e6ddb
      Eric Dumazet authored
      
      
      When dealing with a list of dismantling netns, we can scan
      tcp_metrics once, saving cpu cycles.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      789e6ddb
    • Eric Dumazet's avatar
      ipv6: addrlabel: per netns list · a90c9347
      Eric Dumazet authored
      
      
      Having a global list of labels do not scale to thousands of
      netns in the cloud era. This causes quadratic behavior on
      netns creation and deletion.
      
      This is time having a per netns list of ~10 labels.
      
      Tested:
      
      $ time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 3.637 MB perf.data (~158898 samples) ]
      
      real    0m20.837s # instead of 0m24.227s
      user    0m0.328s
      sys     0m20.338s # instead of 0m23.753s
      
          16.17%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
          12.30%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
           6.76%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
           5.78%       ip  [kernel.kallsyms]  [k] memset_erms
           5.77%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
           5.18%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
           4.96%       ip  [kernel.kallsyms]  [k] _raw_read_lock
           3.82%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
           3.33%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           2.11%       ip  [kernel.kallsyms]  [k] unmap_page_range
           1.77%       ip  [kernel.kallsyms]  [k] __wake_up
           1.69%       ip  [kernel.kallsyms]  [k] strlen
           1.17%       ip  [kernel.kallsyms]  [k] __wake_up_common
           1.09%       ip  [kernel.kallsyms]  [k] insert_header
           1.04%       ip  [kernel.kallsyms]  [k] page_remove_rmap
           1.01%       ip  [kernel.kallsyms]  [k] consume_skb
           0.98%       ip  [kernel.kallsyms]  [k] netlink_trim
           0.51%       ip  [kernel.kallsyms]  [k] kernfs_link_sibling
           0.51%       ip  [kernel.kallsyms]  [k] filemap_map_pages
           0.46%       ip  [kernel.kallsyms]  [k] memcpy_erms
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a90c9347
    • Eric Dumazet's avatar
      kobject: factorize skb setup in kobject_uevent_net_broadcast() · d464e84e
      Eric Dumazet authored
      
      
      We can build one skb and let it be cloned in netlink.
      
      This is much faster, and use less memory (all clones will
      share the same skb->head)
      
      Tested:
      
      time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 4.110 MB perf.data (~179584 samples) ]
      
      real    0m24.227s # instead of 0m52.554s
      user    0m0.329s
      sys 0m23.753s # instead of 0m51.375s
      
          14.77%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
          14.56%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
          11.65%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
           6.19%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
           5.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
           4.97%       ip  [kernel.kallsyms]  [k] memset_erms
           4.67%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
           4.41%       ip  [kernel.kallsyms]  [k] _raw_read_lock
           3.59%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
           3.13%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           1.55%       ip  [kernel.kallsyms]  [k] __wake_up
           1.20%       ip  [kernel.kallsyms]  [k] strlen
           1.03%       ip  [kernel.kallsyms]  [k] __wake_up_common
           0.93%       ip  [kernel.kallsyms]  [k] consume_skb
           0.92%       ip  [kernel.kallsyms]  [k] netlink_trim
           0.87%       ip  [kernel.kallsyms]  [k] insert_header
           0.63%       ip  [kernel.kallsyms]  [k] unmap_page_range
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d464e84e
    • Eric Dumazet's avatar
      kobject: copy env blob in one go · 4a336a23
      Eric Dumazet authored
      
      
      No need to iterate over strings, just copy in one efficient memcpy() call.
      
      Tested:
      time perf record "(for f in `seq 1 3000` ; do ip netns add tast$f; done)"
      [ perf record: Woken up 10 times to write data ]
      [ perf record: Captured and wrote 8.224 MB perf.data (~359301 samples) ]
      
      real    0m52.554s  # instead of 1m7.492s
      user    0m0.309s
      sys 0m51.375s # instead of 1m6.875s
      
           9.88%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
           8.86%       ip  [kernel.kallsyms]  [k] string
           7.37%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
           5.68%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
           5.52%       ip  [kernel.kallsyms]  [k] memcpy_erms
           4.76%       ip  [kernel.kallsyms]  [k] __alloc_skb
           4.54%       ip  [kernel.kallsyms]  [k] vsnprintf
           3.94%       ip  [kernel.kallsyms]  [k] format_decode
           3.80%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node_trace
           3.71%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node
           3.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
           3.38%       ip  [kernel.kallsyms]  [k] strlen
           2.65%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
           2.20%       ip  [kernel.kallsyms]  [k] kfree
           2.09%       ip  [kernel.kallsyms]  [k] memset_erms
           2.07%       ip  [kernel.kallsyms]  [k] ___cache_free
           1.95%       ip  [kernel.kallsyms]  [k] kmem_cache_free
           1.91%       ip  [kernel.kallsyms]  [k] _raw_read_lock
           1.45%       ip  [kernel.kallsyms]  [k] ksize
           1.25%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           1.00%       ip  [kernel.kallsyms]  [k] widen_string
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a336a23
    • Eric Dumazet's avatar
      kobject: add kobject_uevent_net_broadcast() · 16dff336
      Eric Dumazet authored
      
      
      This removes some #ifdef pollution and will ease follow up patches.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16dff336
    • Cong Wang's avatar
      net_sched: no need to free qdisc in RCU callback · 752fbcc3
      Cong Wang authored
      
      
      gen estimator has been rewritten in commit 1c0d32fd
      ("net_sched: gen_estimator: complete rewrite of rate estimators"),
      the caller no longer needs to wait for a grace period. So this
      patch gets rid of it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      752fbcc3
    • Jim Hanko's avatar
      team: fall back to hash if table entry is empty · bd7d2106
      Jim Hanko authored
      
      
      If the hash to port mapping table does not have a valid port (i.e. when
      a port goes down), fall back to the simple hashing mechanism to avoid
      dropping packets.
      
      Signed-off-by: default avatarJim Hanko <hanko@drivescale.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd7d2106