Skip to content
  1. Apr 12, 2019
    • David Miller's avatar
      sctp: Remove superfluous test in sctp_ulpq_reasm_drain(). · 0eff1052
      David Miller authored
      
      
      Inside the loop, we always start with event non-NULL.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eff1052
    • Vlad Buslov's avatar
      net: sched: flower: fix filter net reference counting · 9994677c
      Vlad Buslov authored
      Fix net reference counting in fl_change() and remove redundant call to
      tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
      before releasing exts and deallocating a filter, so this code caused flower
      classifier to obtain net twice per filter that is being deleted.
      
      Implementation of __fl_delete() called tcf_exts_get_net() to pass its
      result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
      and only complicates fl_mask_put() implementation. This functionality seems
      to be copied from filter cleanup code, where it was added by Cong with
      following explanation:
      
          This patchset tries to fix the race between call_rcu() and
          cleanup_net() again. Without holding the netns refcnt the
          tc_action_net_exit() in netns workqueue could be called before
          filter destroy works in tc filter workqueue. This patchset
          moves the netns refcnt from tc actions to tcf_exts, without
          breaking per-netns tc actions.
      
      This doesn't apply to flower mask, which doesn't call any tc action code
      during cleanup. Simplify fl_mask_put() by removing the flag parameter and
      always use tcf_queue_work() to free mask objects.
      
      Fixes: 06177558 ("net: sched: flower: introduce reference counting for filters")
      Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
      Fixes: 05cd271f
      
       ("cls_flower: Support multiple masks per priority")
      Reported-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9994677c
    • David Ahern's avatar
      selftests: Add debugging options to pmtu.sh · 56490b62
      David Ahern authored
      
      
      pmtu.sh script runs a number of tests and dumps a summary of pass/fail.
      If a test fails, it is near impossible to debug why. For example:
      
          TEST: ipv6: PMTU exceptions                       [FAIL]
      
      There are a lot of commands run behind the scenes for this test. Which
      one is failing?
      
      Add a VERBOSE option to show commands that are run and any output from
      those commands. Add a PAUSE_ON_FAIL option to halt the script if a test
      fails allowing users to poke around with the setup in the failed state.
      
      In the process, rename tracing to TRACING and move declaration to top
      with the new variables.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56490b62
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · bb23581b
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2019-04-12
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Improve BPF verifier scalability for large programs through two
         optimizations: i) remove verifier states that are not useful in pruning,
         ii) stop walking parentage chain once first LIVE_READ is seen. Combined
         gives approx 20x speedup. Increase limits for accepting large programs
         under root, and add various stress tests, from Alexei.
      
      2) Implement global data support in BPF. This enables static global variables
         for .data, .rodata and .bss sections to be properly handled which allows
         for more natural program development. This also opens up the possibility
         to optimize program workflow by compiling ELFs only once and later only
         rewriting section data before reload, from Daniel and with test cases and
         libbpf refactoring from Joe.
      
      3) Add config option to generate BTF type info for vmlinux as part of the
         kernel build process. DWARF debug info is converted via pahole to BTF.
         Latter relies on libbpf and makes use of BTF deduplication algorithm which
         results in 100x savings compared to DWARF data. Resulting .BTF section is
         typically about 2MB in size, from Andrii.
      
      4) Add BPF verifier support for stack access with variable offset from
         helpers and add various test cases along with it, from Andrey.
      
      5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
         so that L2 encapsulation can be used for tc tunnels, from Alan.
      
      6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
         users can define a subset of allowed __sk_buff fields that get fed into
         the test program, from Stanislav.
      
      7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
         various UBSAN warnings in bpftool, from Yonghong.
      
      8) Generate a pkg-config file for libbpf, from Luca.
      
      9) Dump program's BTF id in bpftool, from Prashant.
      
      10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
          program, from Magnus.
      
      11) kallsyms related fixes for the case when symbols are not present in
          BPF selftests and samples, from Daniel
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb23581b
    • Stanislav Fomichev's avatar
      bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN · 947e8b59
      Stanislav Fomichev authored
      This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case
      and be sure that nobody is erroneously setting ctx_{in,out}.
      
      Fixes: b0b9395d
      
       ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      947e8b59
    • Daniel Borkmann's avatar
      tools: add smp_* barrier variants to include infrastructure · 6b7a2114
      Daniel Borkmann authored
      Add the definition for smp_rmb(), smp_wmb(), and smp_mb() to the
      tools include infrastructure: this patch adds the implementation
      for x86-64 and arm64, and have it fall back as currently is for
      other archs which do not have it implemented at this point. The
      x86-64 one uses lock + add combination for smp_mb() with address
      below red zone.
      
      This is on top of 09d62154
      
       ("tools, perf: add and use optimized
      ring_buffer_{read_head, write_tail} helpers"), which didn't touch
      smp_* barrier implementations. Magnus recently rightfully reported
      however that the latter on x86-64 still wrongly falls back to sfence,
      lfence and mfence respectively, thus fix that for applications under
      tools making use of these to avoid such ugly surprises. The main
      header under tools (include/asm/barrier.h) will in that case not
      select the fallback implementation.
      
      Reported-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6b7a2114
    • David S. Miller's avatar
      Merge branch 'ipv6-Refactor-nexthop-selection-helpers-during-a-fib-lookup' · 78f07ada
      David S. Miller authored
      
      
      David Ahern says:
      
      ====================
      ipv6: Refactor nexthop selection helpers during a fib lookup
      
      IPv6 has a fib6_nh embedded within each fib6_info and a separate
      fib6_info for each path in a multipath route. A side effect is that
      a fib6_info is passed all the way down the stack when selecting a path
      on a fib lookup. Refactor the fib lookup functions and associated
      helper functions to take a fib6_nh when appropriate to enable IPv6
      to work with nexthop objects where the fib6_nh is not directly part
      of a fib entry.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78f07ada
    • David Ahern's avatar
      ipv6: Refactor __ip6_route_redirect · 0b34eb00
      David Ahern authored
      
      
      Move the nexthop evaluation of a fib entry to a helper that can be
      leveraged for each fib6_nh in a multipath nexthop object.
      
      In the move, 'continue' statements means the helper returns false
      (loop should continue) and 'break' means return true (found the entry
      of interest).
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b34eb00
    • David Ahern's avatar
      ipv6: Refactor rt6_device_match · 0c59d006
      David Ahern authored
      
      
      Move the device and gateway checks in the fib6_next loop to a helper
      that can be called per fib6_nh entry.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c59d006
    • David Ahern's avatar
      ipv6: Move fib6_multipath_select down in ip6_pol_route · d83009d4
      David Ahern authored
      
      
      Move the siblings and fib6_multipath_select after the null entry check
      since a null entry can not have siblings.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d83009d4
    • David Ahern's avatar
      ipv6: Be smarter with null_entry handling in ip6_pol_route_lookup · af52a52c
      David Ahern authored
      
      
      Clean up the fib6_null_entry handling in ip6_pol_route_lookup.
      rt6_device_match can return fib6_null_entry, but fib6_multipath_select
      can not. Consolidate the fib6_null_entry handling and on the final
      null_entry check set rt and goto out - no need to defer to a second
      check after rt6_find_cached_rt.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af52a52c
    • David Ahern's avatar
      ipv6: Refactor find_rr_leaf · 30c15f03
      David Ahern authored
      
      
      find_rr_leaf has 3 loops over fib_entries calling find_match. The loops
      are very similar with differences in start point and whether the metric
      is evaluated:
          1. start at rr_head, no extra loop compare, check fib metric
          2. start at leaf, compare rt against rr_head, check metric
          3. start at cont (potential saved point from earlier loops), no
             extra loop compare, no metric check
      
      Create 1 loop that is called 3 different times. This will make a
      later change with multipath nexthop objects much simpler.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30c15f03
    • David Ahern's avatar
      ipv6: Refactor find_match · 28679ed1
      David Ahern authored
      
      
      find_match primarily needs a fib6_nh (and fib6_flags which it passes
      through to rt6_score_route). Move fib6_check_expired up to the call
      sites so find_match is only called for relevant entries. Remove the
      match argument which is mostly a pass through and use the return
      boolean to decide if match gets set in the call sites.
      
      The end result is a helper that can be called per fib6_nh struct
      which is needed once fib entries reference nexthop objects that
      have more than one fib6_nh.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28679ed1
    • David Ahern's avatar
      ipv6: Pass fib6_nh and flags to rt6_score_route · 702cea56
      David Ahern authored
      
      
      rt6_score_route only needs the fib6_flags and nexthop data. Change
      it accordingly. Allows re-use later for nexthop based fib6_nh.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      702cea56
    • David Ahern's avatar
      ipv6: Change rt6_probe to take a fib6_nh · cc3a86c8
      David Ahern authored
      
      
      rt6_probe sends probes for gateways in a nexthop. As such it really
      depends on a fib6_nh, not a fib entry. Move last_probe to fib6_nh and
      update rt6_probe to a fib6_nh struct.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc3a86c8
    • David Ahern's avatar
      ipv6: Remove rt6_check_dev · 6e1809a5
      David Ahern authored
      
      
      rt6_check_dev is a simpler helper with only 1 caller. Fold the code
      into rt6_score_route.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e1809a5
    • David Ahern's avatar
      ipv6: Only call rt6_check_neigh for nexthop with gateway · 1ba9a895
      David Ahern authored
      
      
      Change rt6_check_neigh to take a fib6_nh instead of a fib entry.
      Move the check on fib_flags and whether the nexthop has a gateway
      up to the one caller.
      
      Remove the inline from the definition as well. Not necessary.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ba9a895
    • Colin Ian King's avatar
      dns: remove redundant zero length namelen check · 62720b12
      Colin Ian King authored
      
      
      The zero namelen check is redundant as it has already been checked
      for zero at the start of the function.  Remove the redundant check.
      
      Addresses-Coverity: ("Logically Dead Code")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62720b12
    • Daniel Borkmann's avatar
      Merge branch 'bpf-l2-encap' · 94c59aab
      Daniel Borkmann authored
      
      
      Alan Maguire says:
      
      ====================
      Extend bpf_skb_adjust_room growth to mark inner MAC header so
      that L2 encapsulation can be used for tc tunnels.
      
      Patch #1 extends the existing test_tc_tunnel to support UDP
      encapsulation; later we want to be able to test MPLS over UDP
      and MPLS over GRE encapsulation.
      
      Patch #2 adds the BPF_F_ADJ_ROOM_ENCAP_L2(len) macro, which
      allows specification of inner mac length.  Other approaches were
      explored prior to taking this approach.  Specifically, I tried
      automatically computing the inner mac length on the basis of the
      specified flags (so inner maclen for GRE/IPv4 encap is the len_diff
      specified to bpf_skb_adjust_room minus GRE + IPv4 header length
      for example).  Problem with this is that we don't know for sure
      what form of GRE/UDP header we have; is it a full GRE header,
      or is it a FOU UDP header or generic UDP encap header? My fear
      here was we'd end up with an explosion of flags.  The other approach
      tried was to support inner L2 header marking as a separate room
      adjustment, i.e. adjust for L3/L4 encap, then call
      bpf_skb_adjust_room for L2 encap.  This can be made to work but
      because it imposed an order on operations, felt a bit clunky.
      
      Patch #3 syncs tools/ bpf.h.
      
      Patch #4 extends the tests again to support MPLSoverGRE,
      MPLSoverUDP, and transparent ethernet bridging (TEB) where
      the inner L2 header is an ethernet header.  Testing of BPF
      encap against tunnels is done for cases where configuration
      of such tunnels is possible (MPLSoverGRE[6], MPLSoverUDP,
      gre[6]tap), and skipped otherwise.  Testing of BPF encap/decap
      is always carried out.
      
      Changes since v2:
       - updated tools/testing/selftest/bpf/config with FOU/MPLS CONFIG
         variables (patches 1, 4)
       - reduced noise in patch 1 by avoiding unnecessary movement of code
       - eliminated inner_mac variable in bpf_skb_net_grow (patch 2)
      
      Changes since v1:
       - fixed formatting of commit references.
       - BPF_F_ADJ_ROOM_FIXED_GSO flag enabled on all variants (patch 1)
       - fixed fou6 options for UDP encap; checksum errors observed were
         due to the fact fou6 tunnel was not set up with correct ipproto
         options (41 -6).  0 checksums work fine (patch 1)
       - added definitions for mask and shift used in setting L2 length
         (patch 2)
       - allow udp encap with fixed GSO (patch 2)
       - changed "elen" to "l2_len" to be more descriptive (patch 4)
      ====================
      
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      94c59aab
    • Alan Maguire's avatar
      selftests_bpf: add L2 encap to test_tc_tunnel · 3ec61df8
      Alan Maguire authored
      
      
      Update test_tc_tunnel to verify adding inner L2 header
      encapsulation (an MPLS label or ethernet header) works.
      
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3ec61df8
    • Alan Maguire's avatar
      bpf: sync bpf.h to tools/ for BPF_F_ADJ_ROOM_ENCAP_L2 · 1db04c30
      Alan Maguire authored
      
      
      Sync include/uapi/linux/bpf.h with tools/ equivalent to add
      BPF_F_ADJ_ROOM_ENCAP_L2(len) macro.
      
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1db04c30
    • Alan Maguire's avatar
      bpf: add layer 2 encap support to bpf_skb_adjust_room · 58dfc900
      Alan Maguire authored
      commit 868d5235
      
       ("bpf: add bpf_skb_adjust_room encap flags")
      introduced support to bpf_skb_adjust_room for GSO-friendly GRE
      and UDP encapsulation.
      
      For GSO to work for skbs, the inner headers (mac and network) need to
      be marked.  For L3 encapsulation using bpf_skb_adjust_room, the mac
      and network headers are identical.  Here we provide a way of specifying
      the inner mac header length for cases where L2 encap is desired.  Such
      an approach can support encapsulated ethernet headers, MPLS headers etc.
      For example to convert from a packet of form [eth][ip][tcp] to
      [eth][ip][udp][inner mac][ip][tcp], something like the following could
      be done:
      
      	headroom = sizeof(iph) + sizeof(struct udphdr) + inner_maclen;
      
      	ret = bpf_skb_adjust_room(skb, headroom, BPF_ADJ_ROOM_MAC,
      				  BPF_F_ADJ_ROOM_ENCAP_L4_UDP |
      				  BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 |
      				  BPF_F_ADJ_ROOM_ENCAP_L2(inner_maclen));
      
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      58dfc900
    • Alan Maguire's avatar
      selftests_bpf: extend test_tc_tunnel for UDP encap · 166b5a7f
      Alan Maguire authored
      commit 868d5235
      
       ("bpf: add bpf_skb_adjust_room encap flags")
      introduced support to bpf_skb_adjust_room for GSO-friendly GRE
      and UDP encapsulation and later introduced associated test_tc_tunnel
      tests.  Here those tests are extended to cover UDP encapsulation also.
      
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      166b5a7f
    • Jon Maloy's avatar
      tipc: use standard write_lock & unlock functions when creating node · 909620ff
      Jon Maloy authored
      In the function tipc_node_create() we protect the peer capability field
      by using the node rw_lock. However, we access the lock directly instead
      of using the dedicated functions for this, as we do everywhere else in
      node.c. This cosmetic spot is fixed here.
      
      Fixes: 40999f11
      
       ("tipc: make link capability update thread safe")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      909620ff
    • Stanislav Fomichev's avatar
      bpf: fix missing bpf_check_uarg_tail_zero in BPF_PROG_TEST_RUN · c695865c
      Stanislav Fomichev authored
      Commit b0b9395d
      
       ("bpf: support input __sk_buff context in
      BPF_PROG_TEST_RUN") started using bpf_check_uarg_tail_zero in
      BPF_PROG_TEST_RUN. However, bpf_check_uarg_tail_zero is not defined
      for !CONFIG_BPF_SYSCALL:
      
      net/bpf/test_run.c: In function ‘bpf_ctx_init’:
      net/bpf/test_run.c:142:9: error: implicit declaration of function ‘bpf_check_uarg_tail_zero’ [-Werror=implicit-function-declaration]
         err = bpf_check_uarg_tail_zero(data_in, max_size, size);
               ^~~~~~~~~~~~~~~~~~~~~~~~
      
      Let's not build net/bpf/test_run.c when CONFIG_BPF_SYSCALL is not set.
      
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Fixes: b0b9395d
      
       ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c695865c
    • Vlad Buslov's avatar
      net: sched: flower: use correct ht function to prevent duplicates · 9e35552a
      Vlad Buslov authored
      Implementation of function rhashtable_insert_fast() check if its internal
      helper function __rhashtable_insert_fast() returns non-NULL pointer and
      seemingly return -EEXIST in such case. However, since
      __rhashtable_insert_fast() is called with NULL key pointer, it never
      actually checks for duplicates, which means that -EEXIST is never returned
      to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
      order to verify that it works as expected and prevent the problem from
      happening in future, extend tc-tests with new test that verifies that no
      new filters with existing key can be inserted to flower classifier.
      
      Fixes: 1f17f774
      
       ("net: sched: flower: insert filter to ht before offloading it to hw")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e35552a
    • Guillaume Nault's avatar
      netns: read NETNSA_NSID as s32 attribute in rtnl_net_getid() · ecce39ec
      Guillaume Nault authored
      
      
      NETNSA_NSID is signed. Use nla_get_s32() to avoid confusion.
      
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecce39ec
  2. Apr 11, 2019