Skip to content
  1. Jun 03, 2019
  2. Jun 02, 2019
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · c1e9e01d
      David S. Miller authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset container Netfilter/IPVS update for net-next:
      
      1) Add UDP tunnel support for ICMP errors in IPVS.
      
      Julian Anastasov says:
      
      This patchset is a followup to the commit that adds UDP/GUE tunnel:
      "ipvs: allow tunneling with gue encapsulation".
      
      What we do is to put tunnel real servers in hash table (patch 1),
      add function to lookup tunnels (patch 2) and use it to strip the
      embedded tunnel headers from ICMP errors (patch 3).
      
      2) Extend xt_owner to match for supplementary groups, from
         Lukasz Pawelczyk.
      
      3) Remove unused oif field in flow_offload_tuple object, from
         Taehee Yoo.
      
      4) Release basechain counters from workqueue to skip synchronize_rcu()
         call. From Florian Westphal.
      
      5) Replace skb_make_writable() by skb_ensure_writable(). Patchset
         from Florian Westphal.
      
      6) Checksum support for gue encapsulation in IPVS, from Jacky Hu.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1e9e01d
  3. Jun 01, 2019
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 0462eaac
      David S. Miller authored
      
      
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2019-05-31
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      Lots of exciting new features in the first PR of this developement cycle!
      The main changes are:
      
      1) misc verifier improvements, from Alexei.
      
      2) bpftool can now convert btf to valid C, from Andrii.
      
      3) verifier can insert explicit ZEXT insn when requested by 32-bit JITs.
         This feature greatly improves BPF speed on 32-bit architectures. From Jiong.
      
      4) cgroups will now auto-detach bpf programs. This fixes issue of thousands
         bpf programs got stuck in dying cgroups. From Roman.
      
      5) new bpf_send_signal() helper, from Yonghong.
      
      6) cgroup inet skb programs can signal CN to the stack, from Lawrence.
      
      7) miscellaneous cleanups, from many developers.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0462eaac
    • Alan Maguire's avatar
      selftests/bpf: measure RTT from xdp using xdping · cd538502
      Alan Maguire authored
      
      
      xdping allows us to get latency estimates from XDP.  Output looks
      like this:
      
      ./xdping -I eth4 192.168.55.8
      Setting up XDP for eth4, please wait...
      XDP setup disrupts network connectivity, hit Ctrl+C to quit
      
      Normal ping RTT data
      [Ignore final RTT; it is distorted by XDP using the reply]
      PING 192.168.55.8 (192.168.55.8) from 192.168.55.7 eth4: 56(84) bytes of data.
      64 bytes from 192.168.55.8: icmp_seq=1 ttl=64 time=0.302 ms
      64 bytes from 192.168.55.8: icmp_seq=2 ttl=64 time=0.208 ms
      64 bytes from 192.168.55.8: icmp_seq=3 ttl=64 time=0.163 ms
      64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.275 ms
      
      4 packets transmitted, 4 received, 0% packet loss, time 3079ms
      rtt min/avg/max/mdev = 0.163/0.237/0.302/0.054 ms
      
      XDP RTT data:
      64 bytes from 192.168.55.8: icmp_seq=5 ttl=64 time=0.02808 ms
      64 bytes from 192.168.55.8: icmp_seq=6 ttl=64 time=0.02804 ms
      64 bytes from 192.168.55.8: icmp_seq=7 ttl=64 time=0.02815 ms
      64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.02805 ms
      
      The xdping program loads the associated xdping_kern.o BPF program
      and attaches it to the specified interface.  If run in client
      mode (the default), it will add a map entry keyed by the
      target IP address; this map will store RTT measurements, current
      sequence number etc.  Finally in client mode the ping command
      is executed, and the xdping BPF program will use the last ICMP
      reply, reformulate it as an ICMP request with the next sequence
      number and XDP_TX it.  After the reply to that request is received
      we can measure RTT and repeat until the desired number of
      measurements is made.  This is why the sequence numbers in the
      normal ping are 1, 2, 3 and 8.  We XDP_TX a modified version
      of ICMP reply 4 and keep doing this until we get the 4 replies
      we need; hence the networking stack only sees reply 8, where
      we have XDP_PASSed it upstream since we are done.
      
      In server mode (-s), xdping simply takes ICMP requests and replies
      to them in XDP rather than passing the request up to the networking
      stack.  No map entry is required.
      
      xdping can be run in native XDP mode (the default, or specified
      via -N) or in skb mode (-S).
      
      A test program test_xdping.sh exercises some of these options.
      
      Note that native XDP does not seem to XDP_TX for veths, hence -N
      is not tested.  Looking at the code, it looks like XDP_TX is
      supported so I'm not sure if that's expected.  Running xdping in
      native mode for ixgbe as both client and server works fine.
      
      Changes since v4
      
      - close fds on cleanup (Song Liu)
      
      Changes since v3
      
      - fixed seq to be __be16 (Song Liu)
      - fixed fd checks in xdping.c (Song Liu)
      
      Changes since v2
      
      - updated commit message to explain why seq number of last
        ICMP reply is 8 not 4 (Song Liu)
      - updated types of seq number, raddr and eliminated csum variable
        in xdpclient/xdpserver functions as it was not needed (Song Liu)
      - added XDPING_DEFAULT_COUNT definition and usage specification of
        default/max counts (Song Liu)
      
      Changes since v1
       - moved from RFC to PATCH
       - removed unused variable in ipv4_csum() (Song Liu)
       - refactored ICMP checks into icmp_check() function called by client
         and server programs and reworked client and server programs due
         to lack of shared code (Song Liu)
       - added checks to ensure that SKB and native mode are not requested
         together (Song Liu)
      
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cd538502
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 33aae282
      David S. Miller authored
      
      
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2019-05-31
      
      This series contains updates to the iavf driver.
      
      Nathan Chancellor converts the use of gnu_printf to printf.
      
      Aleksandr modifies the driver to limit the number of RSS queues to the
      number of online CPUs in order to avoid creating misconfigured RSS
      queues.
      
      Gustavo A. R. Silva converts a couple of instances where sizeof() can be
      replaced with struct_size().
      
      Alice makes the remaining changes to the iavf driver to cleanup all the
      old "i40evf" references in the driver to iavf, including the file names
      that still contained the old driver reference.  There was no functional
      changes made, just cosmetic to reduce any confusion going forward now
      that the iavf driver is the virtual function driver for both i40e and
      ice drivers.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33aae282
    • Jiong Wang's avatar
      bpf: doc: update answer for 32-bit subregister question · c231c22a
      Jiong Wang authored
      
      
      There has been quite a few progress around the two steps mentioned in the
      answer to the following question:
      
        Q: BPF 32-bit subregister requirements
      
      This patch updates the answer to reflect what has been done.
      
      v2:
       - Add missing full stop. (Song Liu)
       - Minor tweak on one sentence. (Song Liu)
      
      v1:
       - Integrated rephrase from Quentin and Jakub
      
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c231c22a
    • Alexei Starovoitov's avatar
      Merge branch 'map-charge-cleanup' · d168286d
      Alexei Starovoitov authored
      
      
      Roman Gushchin says:
      
      ====================
      During my work on memcg-based memory accounting for bpf maps
      I've done some cleanups and refactorings of the existing
      memlock rlimit-based code. It makes it more robust, unifies
      size to pages conversion, size checks and corresponding error
      codes. Also it adds coverage for cgroup local storage and
      socket local storage maps.
      
      It looks like some preliminary work on the mm side might be
      required to start working on the memcg-based accounting,
      so I'm sending these patches as a separate patchset.
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d168286d
    • Roman Gushchin's avatar
      bpf: move memory size checks to bpf_map_charge_init() · c85d6913
      Roman Gushchin authored
      
      
      Most bpf map types doing similar checks and bytes to pages
      conversion during memory allocation and charging.
      
      Let's unify these checks by moving them into bpf_map_charge_init().
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c85d6913
    • Roman Gushchin's avatar
      bpf: rework memlock-based memory accounting for maps · b936ca64
      Roman Gushchin authored
      
      
      In order to unify the existing memlock charging code with the
      memcg-based memory accounting, which will be added later, let's
      rework the current scheme.
      
      Currently the following design is used:
        1) .alloc() callback optionally checks if the allocation will likely
           succeed using bpf_map_precharge_memlock()
        2) .alloc() performs actual allocations
        3) .alloc() callback calculates map cost and sets map.memory.pages
        4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
           and performs actual charging; in case of failure the map is
           destroyed
        <map is in use>
        1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
           performs uncharge and releases the user
        2) .map_free() callback releases the memory
      
      The scheme can be simplified and made more robust:
        1) .alloc() calculates map cost and calls bpf_map_charge_init()
        2) bpf_map_charge_init() sets map.memory.user and performs actual
          charge
        3) .alloc() performs actual allocations
        <map is in use>
        1) .map_free() callback releases the memory
        2) bpf_map_charge_finish() performs uncharge and releases the user
      
      The new scheme also allows to reuse bpf_map_charge_init()/finish()
      functions for memcg-based accounting. Because charges are performed
      before actual allocations and uncharges after freeing the memory,
      no bogus memory pressure can be created.
      
      In cases when the map structure is not available (e.g. it's not
      created yet, or is already destroyed), on-stack bpf_map_memory
      structure is used. The charge can be transferred with the
      bpf_map_charge_move() function.
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b936ca64
    • Roman Gushchin's avatar
      bpf: group memory related fields in struct bpf_map_memory · 3539b96e
      Roman Gushchin authored
      
      
      Group "user" and "pages" fields of bpf_map into the bpf_map_memory
      structure. Later it can be extended with "memcg" and other related
      information.
      
      The main reason for a such change (beside cosmetics) is to pass
      bpf_map_memory structure to charging functions before the actual
      allocation of bpf_map.
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3539b96e
    • Roman Gushchin's avatar
      bpf: add memlock precharge for socket local storage · d50836cd
      Roman Gushchin authored
      
      
      Socket local storage maps lack the memlock precharge check,
      which is performed before the memory allocation for
      most other bpf map types.
      
      Let's add it in order to unify all map types.
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d50836cd
    • Roman Gushchin's avatar
      bpf: add memlock precharge check for cgroup_local_storage · ffc8b144
      Roman Gushchin authored
      
      
      Cgroup local storage maps lack the memlock precharge check,
      which is performed before the memory allocation for
      most other bpf map types.
      
      Let's add it in order to unify all map types.
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ffc8b144
    • Alexei Starovoitov's avatar
      Merge branch 'propagate-cn-to-tcp' · 576240cf
      Alexei Starovoitov authored
      
      
      Lawrence Brakmo says:
      
      ====================
      This patchset adds support for propagating congestion notifications (cn)
      to TCP from cgroup inet skb egress BPF programs.
      
      Current cgroup skb BPF programs cannot trigger TCP congestion window
      reductions, even when they drop a packet. This patch-set adds support
      for cgroup skb BPF programs to send congestion notifications in the
      return value when the packets are TCP packets. Rather than the
      current 1 for keeping the packet and 0 for dropping it, they can
      now return:
          NET_XMIT_SUCCESS    (0)    - continue with packet output
          NET_XMIT_DROP       (1)    - drop packet and do cn
          NET_XMIT_CN         (2)    - continue with packet output and do cn
          -EPERM                     - drop packet
      
      Finally, HBM programs are modified to collect and return more
      statistics.
      
      There has been some discussion regarding the best place to manage
      bandwidths. Some believe this should be done in the qdisc where it can
      also be managed with a BPF program. We believe there are advantages
      for doing it with a BPF program in the cgroup/skb callback. For example,
      it reduces overheads in the cases where there is on primary workload and
      one or more secondary workloads, where each workload is running on its
      own cgroupv2. In this scenario, we only need to throttle the secondary
      workloads and there is no overhead for the primary workload since there
      will be no BPF program attached to its cgroup.
      
      Regardless, we agree that this mechanism should not penalize those that
      are not using it. We tested this by doing 1 byte req/reply RPCs over
      loopback. Each test consists of 30 sec of back-to-back 1 byte RPCs.
      Each test was repeated 50 times with a 1 minute delay between each set
      of 10. We then calculated the average RPCs/sec over the 50 tests. We
      compare upstream with upstream + patchset and no BPF program as well
      as upstream + patchset and a BPF program that just returns ALLOW_PKT.
      Here are the results:
      
      upstream                           80937 RPCs/sec
      upstream + patches, no BPF program 80894 RPCs/sec
      upstream + patches, BPF program    80634 RPCs/sec
      
      These numbers indicate that there is no penalty for these patches
      
      The use of congestion notifications improves the performance of HBM when
      using Cubic. Without congestion notifications, Cubic will not decrease its
      cwnd and HBM will need to drop a large percentage of the packets.
      
      The following results are obtained for rate limits of 1Gbps,
      between two servers using netperf, and only one flow. We also show how
      reducing the max delayed ACK timer can improve the performance when
      using Cubic.
      
      Command used was:
        ./do_hbm_test.sh -l -D --stats -N -r=<rate> [--no_cn] [dctcp] \
                         -s=<server running netserver>
        where:
           <rate>   is 1000
           --no_cn  specifies no cwr notifications
           dctcp    uses dctcp
      
                             Cubic                    DCTCP
      Lim, DA      Mbps cwnd cred drops  Mbps cwnd cred drops
      --------     ---- ---- ---- -----  ---- ---- ---- -----
        1G, 40       35  462 -320 67%     995    1 -212  0.05%
        1G, 40,cn   736    9  -78  0.07   995    1 -212  0.05
        1G,  5,cn   941    2 -189  0.13   995    1 -212  0.05
      
      Notes:
        --no_cn has no effect with DCTCP
        Lim = rate limit
        DA = maximum delay ack timer
        cred = credit in packets
        drops = % packets dropped
      
      v1->v2: Insures that only BPF_CGROUP_INET_EGRESS can return values 2 and 3
              New egress values apply to all protocols, not just TCP
              Cleaned up patch 4, Update BPF_CGROUP_RUN_PROG_INET_EGRESS callers
              Removed changes to __tcp_transmit_skb (patch 5), no longer needed
              Removed sample use of EDT
      v2->v3: Removed the probe timer related changes
      v3->v4: Replaced preempt_enable_no_resched() by preempt_enable()
              in BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() macro
      ====================
      
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      576240cf
    • brakmo's avatar
      bpf: Add more stats to HBM · d58c6f72
      brakmo authored
      
      
      Adds more stats to HBM, including average cwnd and rtt of all TCP
      flows, percents of packets that are ecn ce marked and distribution
      of return values.
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d58c6f72
    • brakmo's avatar
      bpf: Add cn support to hbm_out_kern.c · ffd81558
      brakmo authored
      
      
      Update hbm_out_kern.c to support returning cn notifications.
      Also updates relevant files to allow disabling cn notifications.
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ffd81558
    • brakmo's avatar
      bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls · 956fe219
      brakmo authored
      
      
      Update BPF_CGROUP_RUN_PROG_INET_EGRESS() callers to support returning
      congestion notifications from the BPF programs.
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      956fe219
    • brakmo's avatar
      bpf: Update __cgroup_bpf_run_filter_skb with cn · e7a3160d
      brakmo authored
      
      
      For egress packets, __cgroup_bpf_fun_filter_skb() will now call
      BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() instead of PROG_CGROUP_RUN_ARRAY()
      in order to propagate congestion notifications (cn) requests to TCP
      callers.
      
      For egress packets, this function can return:
         NET_XMIT_SUCCESS    (0)    - continue with packet output
         NET_XMIT_DROP       (1)    - drop packet and notify TCP to call cwr
         NET_XMIT_CN         (2)    - continue with packet output and notify TCP
                                      to call cwr
         -EPERM                     - drop packet
      
      For ingress packets, this function will return -EPERM if any attached
      program was found and if it returned != 1 during execution. Otherwise 0
      is returned.
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e7a3160d
    • brakmo's avatar
      bpf: cgroup inet skb programs can return 0 to 3 · 5cf1e914
      brakmo authored
      
      
      Allows cgroup inet skb programs to return values in the range [0, 3].
      The second bit is used to deterine if congestion occurred and higher
      level protocol should decrease rate. E.g. TCP would call tcp_enter_cwr()
      
      The bpf_prog must set expected_attach_type to BPF_CGROUP_INET_EGRESS
      at load time if it uses the new return values (i.e. 2 or 3).
      
      The expected_attach_type is currently not enforced for
      BPF_PROG_TYPE_CGROUP_SKB.  e.g Meaning the current bpf_prog with
      expected_attach_type setting to BPF_CGROUP_INET_EGRESS can attach to
      BPF_CGROUP_INET_INGRESS.  Blindly enforcing expected_attach_type will
      break backward compatibility.
      
      This patch adds a enforce_expected_attach_type bit to only
      enforce the expected_attach_type when it uses the new
      return value.
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5cf1e914
    • brakmo's avatar
      bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY · 1f52f6c0
      brakmo authored
      
      
      Create new macro BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() to be used by
      __cgroup_bpf_run_filter_skb for EGRESS BPF progs so BPF programs can
      request cwr for TCP packets.
      
      Current cgroup skb programs can only return 0 or 1 (0 to drop the
      packet. This macro changes the behavior so the low order bit
      indicates whether the packet should be dropped (0) or not (1)
      and the next bit is used for congestion notification (cn).
      
      Hence, new allowed return values of CGROUP EGRESS BPF programs are:
        0: drop packet
        1: keep packet
        2: drop packet and call cwr
        3: keep packet and call cwr
      
      This macro then converts it to one of NET_XMIT values or -EPERM
      that has the effect of dropping the packet with no cn.
        0: NET_XMIT_SUCCESS  skb should be transmitted (no cn)
        1: NET_XMIT_DROP     skb should be dropped and cwr called
        2: NET_XMIT_CN       skb should be transmitted and cwr called
        3: -EPERM            skb should be dropped (no cn)
      
      Note that when more than one BPF program is called, the packet is
      dropped if at least one of programs requests it be dropped, and
      there is cn if at least one program returns cn.
      
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1f52f6c0
    • Colin Ian King's avatar
      xen-netback: remove redundant assignment to err · 587a7126
      Colin Ian King authored
      
      
      The variable err is assigned with the value -ENOMEM that is never
      read and it is re-assigned a new value later on.  The assignment is
      redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      587a7126
    • Colin Ian King's avatar
      nexthop: remove redundant assignment to err · 6f43e525
      Colin Ian King authored
      
      
      The variable err is initialized with a value that is never read
      and err is reassigned a few statements later. This initialization
      is redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f43e525
    • David S. Miller's avatar
      Merge branch 'phylink-sfp-updates' · 6912378d
      David S. Miller authored
      
      
      Russell King says:
      
      ====================
      phylink/sfp updates
      
      This is a series of updates to phylink and sfp:
      
      - Remove an unused net device argument from the phylink MII ioctl
        emulation code.
      
      - add support for using interrupts when using a GPIO for link status
        tracking, rather than polling it at one second intervals.  This
        reduces the need to wakeup the CPU every second.
      
      - add support to the MII ioctl API to read and write Clause 45 PHY
        registers.  I don't know how desirable this is for mainline, but I
        have used this facility extensively to investigate the Marvell
        88x3310 PHY.  A recent illustration of use for this was debugging
        the PHY-without-firmware problem recently reported.
      
      - add mandatory attach/detach methods for the upstream side of sfp
        bus code, which will allow us to remove the "netdev" structure from
        the SFP layers.
      
      - remove the "netdev" structure from the SFP upstream registration
        calls, which simplifies PHY to SFP links.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6912378d
    • Russell King's avatar
      net: sfp: remove sfp-bus use of netdevs · 54f70b3b
      Russell King authored
      
      
      The sfp-bus code now no longer has any use for the network device
      structure, so remove its use.
      
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54f70b3b
    • Russell King's avatar
      net: sfp: add mandatory attach/detach methods for sfp buses · 320587e6
      Russell King authored
      
      
      Add attach and detach methods for SFP buses, which will allow us to get
      rid of the netdev storage in sfp-bus.
      
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      320587e6
    • Russell King's avatar
      net: phy: allow Clause 45 access via mii ioctl · cdea04c2
      Russell King authored
      
      
      Allow userspace to generate Clause 45 MII access cycles via phylib.
      This is useful for tools such as mii-diag to be able to inspect Clause
      45 PHYs.
      
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdea04c2
    • Russell King's avatar
      net: phylink: support for link gpio interrupt · 7b3b0e89
      Russell King authored
      
      
      Add support for using GPIO interrupts with a fixed-link GPIO rather than
      polling the GPIO every second and invoking the phylink resolution.  This
      avoids unnecessary calls to mac_config().
      
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b3b0e89
    • Russell King's avatar
      net: phylink: remove netdev from phylink mii ioctl emulation · 7fdc455e
      Russell King authored
      
      
      The netdev used in the phylink ioctl emulation is never used, so let's
      remove it.
      
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fdc455e
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b4b12b0d
      David S. Miller authored
      
      
      The phylink conflict was between a bug fix by Russell King
      to make sure we have a consistent PHY interface mode, and
      a change in net-next to pull some code in phylink_resolve()
      into the helper functions phylink_mac_link_{up,down}()
      
      On the dp83867 side it's mostly overlapping changes, with
      the 'net' side removing a condition that was supposed to
      trigger for RGMII but because of how it was coded never
      actually could trigger.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4b12b0d
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y · c9bb6165
      Pablo Neira Ayuso authored
      This patch fixes a few problems with CONFIG_IPV6=y and
      CONFIG_NF_CONNTRACK_BRIDGE=m:
      
      In file included from net/netfilter/utils.c:5:
      include/linux/netfilter_ipv6.h: In function 'nf_ipv6_br_defrag':
      include/linux/netfilter_ipv6.h:110:9: error: implicit declaration of function 'nf_ct_frag6_gather'; did you mean 'nf_ct_attach'? [-Werror=implicit-function-declaration]
      
      And these too:
      
      net/ipv6/netfilter.c:242:2: error: unknown field 'br_defrag' specified in initializer
      net/ipv6/netfilter.c:243:2: error: unknown field 'br_fragment' specified in initializer
      
      This patch includes an original chunk from wenxu.
      
      Fixes: 764dd163
      
       ("netfilter: nf_conntrack_bridge: add support for IPv6")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Reported-by: default avatarYuehaibing <yuehaibing@huawei.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Reported-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9bb6165
    • Jacky Hu's avatar
      ipvs: add checksum support for gue encapsulation · 29930e31
      Jacky Hu authored
      
      
      Add checksum support for gue encapsulation with the tun_flags parameter,
      which could be one of the values below:
      IP_VS_TUNNEL_ENCAP_FLAG_NOCSUM
      IP_VS_TUNNEL_ENCAP_FLAG_CSUM
      IP_VS_TUNNEL_ENCAP_FLAG_REMCSUM
      
      Signed-off-by: default avatarJacky Hu <hengqing.hu@gmail.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      29930e31
    • Florian Westphal's avatar
      netfilter: replace skb_make_writable with skb_ensure_writable · 2cf6bffc
      Florian Westphal authored
      
      
      This converts all remaining users and then removes skb_make_writable.
      
      Suggested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2cf6bffc
    • Florian Westphal's avatar
      netfilter: tcpmss, optstrip: prefer skb_ensure_writable · fb2eb1c1
      Florian Westphal authored
      
      
      This also changes optstrip to only make the tcp header writeable
      rather than the entire packet.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fb2eb1c1
    • Florian Westphal's avatar
      netfilter: xt_HL: prefer skb_ensure_writable · 8e03707f
      Florian Westphal authored
      
      
      Also, make the argument to be only the needed size of the header
      we're altering, no need to pull in the full packet into linear area.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8e03707f
    • Florian Westphal's avatar
      netfilter: nf_tables: prefer skb_ensure_writable · 7418ee4c
      Florian Westphal authored
      
      
      .. so skb_make_writable can be removed.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7418ee4c
    • Florian Westphal's avatar
      netfilter: ipv4: prefer skb_ensure_writable · 3862c6a9
      Florian Westphal authored
      
      
      .. so skb_make_writable can be removed soon.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3862c6a9
    • Florian Westphal's avatar
      netfilter: conntrack, nat: prefer skb_ensure_writable · 86f04538
      Florian Westphal authored
      
      
      like previous patches -- convert conntrack to use the core helper.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      86f04538
    • Florian Westphal's avatar
      netfilter: ipvs: prefer skb_ensure_writable · ec0974df
      Florian Westphal authored
      
      
      It does the same thing, use it instead so we can remove skb_make_writable.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ec0974df
    • Florian Westphal's avatar
      netfilter: bridge: convert skb_make_writable to skb_ensure_writable · c1a83116
      Florian Westphal authored
      
      
      Back in the day, skb_ensure_writable did not exist.  By now, both functions
      have the same precondition:
      
      I. skb_make_writable will test in this order:
        1. wlen > skb->len -> error
        2. if not cloned and wlen <= headlen -> OK
        3. If cloned and wlen bytes of clone writeable -> OK
      
      After those checks, skb is either not cloned but needs to pull from
      nonlinear area, or writing to head would also alter data of another clone.
      
      In both cases skb_make_writable will then call __pskb_pull_tail, which will
      kmalloc a new memory area to use for skb->head.
      
      IOW, after successful skb_make_writable call, the requested length is in
      linear area and can be modified, even if skb was cloned.
      
      II. skb_ensure_writable will do this instead:
         1. call pskb_may_pull.  This handles case 1 above.
            After this, wlen is in linear area, but skb might be cloned.
         2. return if skb is not cloned
         3. return if wlen byte of clone are writeable.
         4. fully copy the skb.
      
      So post-conditions are the same:
      *len bytes are writeable in linear area without altering any payload data
      of a clone, all header pointers might have been changed.
      
      Only differences are that skb_ensure_writable is in the core, whereas
      skb_make_writable lives in netfilter core and the inverted return value.
      skb_make_writable returns 0 on error, whereas skb_ensure_writable returns
      negative value.
      
      For the normal cases performance is similar:
      A. skb is not cloned and in linear area:
         pskb_may_pull is inline helper, so neither function copies.
      B. skb is cloned, write is in linear area and clone is writeable:
         both funcions return with step 3.
      
      This series removes skb_make_writable from the kernel.
      
      While at it, pass the needed value instead, its less confusing that way:
      There is no special-handling of "0-length" argument in either
      skb_make_writable or skb_ensure_writable.
      
      bridge already makes sure ethernet header is in linear area, only purpose
      of the make_writable() is is to copy skb->head in case of cloned skbs.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c1a83116
    • Florian Westphal's avatar
      netfilter: nf_tables: free base chain counters from worker · 53315ac6
      Florian Westphal authored
      
      
      No need to use synchronize_rcu() here, just swap the two pointers
      and have the release occur from work queue after commit has completed.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      53315ac6