Skip to content
  1. Aug 05, 2017
    • David Daney's avatar
      MIPS: Add missing file for eBPF JIT. · b6bd53f9
      David Daney authored
      Inexplicably, commit f381bf6d ("MIPS: Add support for eBPF JIT.")
      lost a file somewhere on its path to Linus' tree.  Add back the
      missing ebpf_jit.c so that we can build with CONFIG_BPF_JIT selected.
      
      This version of ebpf_jit.c is identical to the original except for two
      minor change need to resolve conflicts with changes merged from the
      BPF branch:
      
      A) Set prog->jited_len = image_size;
      B) Use BPF_TAIL_CALL instead of BPF_CALL | BPF_X
      
      Fixes: f381bf6d
      
       ("MIPS: Add support for eBPF JIT.")
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6bd53f9
    • David S. Miller's avatar
      Merge branch 's390-bpf-jit-fixes' · 7a973251
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      Two BPF fixes for s390
      
      Found while testing some other work touching JITs.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a973251
    • Daniel Borkmann's avatar
      bpf, s390: fix build for libbpf and selftest suite · bad1926d
      Daniel Borkmann authored
      The BPF feature test as well as libbpf is missing the __NR_bpf
      define for s390 and currently refuses to compile (selftest suite
      depends on libbpf as well). Similar issue was fixed some time
      ago via b0c47807
      
       ("bpf: Add sparc support to tools and
      samples."), just do the same and add definitions.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bad1926d
    • Daniel Borkmann's avatar
      bpf, s390: fix jit branch offset related to ldimm64 · b0a0c256
      Daniel Borkmann authored
      While testing some other work that required JIT modifications, I
      run into test_bpf causing a hang when JIT enabled on s390. The
      problematic test case was the one from ddc665a4 (bpf, arm64:
      fix jit branch offset related to ldimm64), and turns out that we
      do have a similar issue on s390 as well. In bpf_jit_prog() we
      update next instruction address after returning from bpf_jit_insn()
      with an insn_count. bpf_jit_insn() returns either -1 in case of
      error (e.g. unsupported insn), 1 or 2. The latter is only the
      case for ldimm64 due to spanning 2 insns, however, next address
      is only set to i + 1 not taking actual insn_count into account,
      thus fix is to use insn_count instead of 1. bpf_jit_enable in
      mode 2 provides also disasm on s390:
      
      Before fix:
      
        000003ff800349b6: a7f40003   brc     15,3ff800349bc                 ; target
        000003ff800349ba: 0000               unknown
        000003ff800349bc: e3b0f0700024       stg     %r11,112(%r15)
        000003ff800349c2: e3e0f0880024       stg     %r14,136(%r15)
        000003ff800349c8: 0db0               basr    %r11,%r0
        000003ff800349ca: c0ef00000000       llilf   %r14,0
        000003ff800349d0: e320b0360004       lg      %r2,54(%r11)
        000003ff800349d6: e330b03e0004       lg      %r3,62(%r11)
        000003ff800349dc: ec23ffeda065       clgrj   %r2,%r3,10,3ff800349b6 ; jmp
        000003ff800349e2: e3e0b0460004       lg      %r14,70(%r11)
        000003ff800349e8: e3e0b04e0004       lg      %r14,78(%r11)
        000003ff800349ee: b904002e   lgr     %r2,%r14
        000003ff800349f2: e3b0f0700004       lg      %r11,112(%r15)
        000003ff800349f8: e3e0f0880004       lg      %r14,136(%r15)
        000003ff800349fe: 07fe               bcr     15,%r14
      
      After fix:
      
        000003ff80ef3db4: a7f40003   brc     15,3ff80ef3dba
        000003ff80ef3db8: 0000               unknown
        000003ff80ef3dba: e3b0f0700024       stg     %r11,112(%r15)
        000003ff80ef3dc0: e3e0f0880024       stg     %r14,136(%r15)
        000003ff80ef3dc6: 0db0               basr    %r11,%r0
        000003ff80ef3dc8: c0ef00000000       llilf   %r14,0
        000003ff80ef3dce: e320b0360004       lg      %r2,54(%r11)
        000003ff80ef3dd4: e330b03e0004       lg      %r3,62(%r11)
        000003ff80ef3dda: ec230006a065       clgrj   %r2,%r3,10,3ff80ef3de6 ; jmp
        000003ff80ef3de0: e3e0b0460004       lg      %r14,70(%r11)
        000003ff80ef3de6: e3e0b04e0004       lg      %r14,78(%r11)          ; target
        000003ff80ef3dec: b904002e   lgr     %r2,%r14
        000003ff80ef3df0: e3b0f0700004       lg      %r11,112(%r15)
        000003ff80ef3df6: e3e0f0880004       lg      %r14,136(%r15)
        000003ff80ef3dfc: 07fe               bcr     15,%r14
      
      test_bpf.ko suite runs fine after the fix.
      
      Fixes: 05462310
      
       ("s390/bpf: Add s390x eBPF JIT compiler backend")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0a0c256
    • David S. Miller's avatar
      Merge branch 'mlxsw-Couple-of-fixes' · 1aff0c34
      David S. Miller authored
      
      
      Jiri Pirko says:
      
      ====================
      mlxsw: Couple of fixes
      
      Ido says:
      
      The first patch prevents us from warning about valid situations that can
      happen due to the fact that some operations in switchdev are deferred.
      
      Second patch fixes a long standing problem in which we didn't correctly
      free resources upon module removal, resulting in a memory leak.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1aff0c34
    • Ido Schimmel's avatar
      mlxsw: spectrum_switchdev: Release multicast groups during fini · 852cfeed
      Ido Schimmel authored
      Each multicast group (MID) stores a bitmap of ports to which a packet
      should be forwarded to in case an MDB entry associated with the MID is
      hit.
      
      Since the initial introduction of IGMP snooping in commit 3a49b4fd
      ("mlxsw: Adding layer 2 multicast support") the driver didn't correctly
      free these multicast groups upon ungraceful situations such as the
      removal of the upper bridge device or module removal.
      
      The correct way to fix this is to associate each MID with the bridge
      ports member in it and then drop the reference in case the bridge port
      is destroyed, but this will result in a lot more code and will be fixed
      in net-next.
      
      For now, upon module removal, traverse the MID list and release each
      one.
      
      Fixes: 3a49b4fd
      
       ("mlxsw: Adding layer 2 multicast support")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      852cfeed
    • Ido Schimmel's avatar
      mlxsw: spectrum_switchdev: Don't warn about valid situations · 17b334a8
      Ido Schimmel authored
      Some operations in the bridge driver such as MDB deletion are preformed
      in an atomic context and thus deferred to a process context by the
      switchdev infrastructure.
      
      Therefore, by the time the operation is performed by the underlying
      device driver it's possible the bridge port context is already gone.
      This is especially true for removal flows, but theoretically can also be
      invoked during addition.
      
      Remove the warnings in such situations and return normally.
      
      Fixes: c57529e1 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
      Fixes: 3922285d
      
       ("net: bridge: Add support for offloading port attributes")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17b334a8
  2. Aug 04, 2017
    • David S. Miller's avatar
      Merge branch 'tcp-xmit-timer-rearming' · 337f1b07
      David S. Miller authored
      
      
      Neal Cardwell says:
      
      ====================
      tcp: fix xmit timer rearming to avoid stalls
      
      This patch series is a bug fix for a TCP loss recovery performance bug
      reported independently in recent netdev threads:
      
       (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
       (ii) July 26, 2017: netdev thread:
             "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
             outstanding TLP retransmission"
      
      Many thanks to Klavs Klavsen and Mao Wenan for the detailed reports,
      traces, and packetdrill test cases, which enabled us to root-cause
      this issue and verify the fix.
      
      - v1 -> v2:
       - In patch 2/3, changed an unclear comment in the pre-existing code
         in tcp_schedule_loss_probe() to be more clear (thanks to Eric Dumazet
         for suggesting we improve this).
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      337f1b07
    • Neal Cardwell's avatar
      tcp: fix xmit timer to only be reset if data ACKed/SACKed · df92c839
      Neal Cardwell authored
      Fix a TCP loss recovery performance bug raised recently on the netdev
      list, in two threads:
      
      (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
      (ii) July 26, 2017: netdev thread:
           "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
           outstanding TLP retransmission"
      
      The basic problem is that incoming TCP packets that did not indicate
      forward progress could cause the xmit timer (TLP or RTO) to be rearmed
      and pushed back in time. In certain corner cases this could result in
      the following problems noted in these threads:
      
       - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
         could cause TCP to repeatedly schedule TLPs forever. We kept
         sending TLPs after every ~200ms, which elicited bogus SACKs, which
         caused more TLPs, ad infinitum; we never fired an RTO to fill in
         the holes.
      
       - Incoming data segments could, in some cases, cause us to reschedule
         our RTO or TLP timer further out in time, for no good reason. This
         could cause repeated inbound data to result in stalls in outbound
         data, in the presence of packet loss.
      
      This commit fixes these bugs by changing the TLP and RTO ACK
      processing to:
      
       (a) Only reschedule the xmit timer once per ACK.
      
       (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
           ACK indicates sufficient forward progress (a packet was
           cumulatively ACKed, or we got a SACK for a packet that was sent
           before the most recent retransmit of the write queue head).
      
      This brings us back into closer compliance with the RFCs, since, as
      the comment for tcp_rearm_rto() notes, we should only restart the RTO
      timer after forward progress on the connection. Previously we were
      restarting the xmit timer even in these cases where there was no
      forward progress.
      
      As a side benefit, this commit simplifies and speeds up the TCP timer
      arming logic. We had been calling inet_csk_reset_xmit_timer() three
      times on normal ACKs that cumulatively acknowledged some data:
      
      1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
              if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
                     tcp_rearm_rto(sk);
      
      2) Once in tcp_clean_rtx_queue(), to update the RTO:
              if (flag & FLAG_ACKED) {
                     tcp_rearm_rto(sk);
      
      3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
         to TLP:
              if (icsk->icsk_pending == ICSK_TIME_RETRANS)
                     tcp_schedule_loss_probe(sk);
      
      This commit, by only rescheduling the xmit timer once per ACK,
      simplifies the code and reduces CPU overhead.
      
      This commit was tested in an A/B test with Google web server
      traffic. SNMP stats and request latency metrics were within noise
      levels, substantiating that for normal web traffic patterns this is a
      rare issue. This commit was also tested with packetdrill tests to
      verify that it fixes the timer behavior in the corner cases discussed
      in the netdev threads mentioned above.
      
      This patch is a bug fix patch intended to be queued for -stable
      relases.
      
      Fixes: 6ba8a3b1
      
       ("tcp: Tail loss probe (TLP)")
      Reported-by: default avatarKlavs Klavsen <kl@vsen.dk>
      Reported-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df92c839
    • Neal Cardwell's avatar
      tcp: enable xmit timer fix by having TLP use time when RTO should fire · a2815817
      Neal Cardwell authored
      Have tcp_schedule_loss_probe() base the TLP scheduling decision based
      on when the RTO *should* fire. This is to enable the upcoming xmit
      timer fix in this series, where tcp_schedule_loss_probe() cannot
      assume that the last timer installed was an RTO timer (because we are
      no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
      ACK). So tcp_schedule_loss_probe() must independently figure out when
      an RTO would want to fire.
      
      In the new TLP implementation following in this series, we cannot
      assume that icsk_timeout was set based on an RTO; after processing a
      cumulative ACK the icsk_timeout we see can be from a previous TLP or
      RTO. So we need to independently recalculate the RTO time (instead of
      reading it out of icsk_timeout). Removing this dependency on the
      nature of icsk_timeout makes things a little easier to reason about
      anyway.
      
      Note that the old and new code should be equivalent, since they are
      both saying: "if the RTO is in the future, but at an earlier time than
      the normal TLP time, then set the TLP timer to fire when the RTO would
      have fired".
      
      Fixes: 6ba8a3b1
      
       ("tcp: Tail loss probe (TLP)")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2815817
    • Neal Cardwell's avatar
      tcp: introduce tcp_rto_delta_us() helper for xmit timer fix · e1a10ef7
      Neal Cardwell authored
      Pure refactor. This helper will be required in the xmit timer fix
      later in the patch series. (Because the TLP logic will want to make
      this calculation.)
      
      Fixes: 6ba8a3b1
      
       ("tcp: Tail loss probe (TLP)")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1a10ef7
    • Xin Long's avatar
      ipv6: set rt6i_protocol properly in the route when it is installed · b91d5329
      Xin Long authored
      After commit c2ed1880 ("net: ipv6: check route protocol when
      deleting routes"), ipv6 route checks rt protocol when trying to
      remove a rt entry.
      
      It introduced a side effect causing 'ip -6 route flush cache' not
      to work well. When flushing caches with iproute, all route caches
      get dumped from kernel then removed one by one by sending DELROUTE
      requests to kernel for each cache.
      
      The thing is iproute sends the request with the cache whose proto
      is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
      it. But in kernel the rt_cache protocol is still 0, which causes
      the cache not to be matched and removed.
      
      So the real reason is rt6i_protocol in the route is not set when
      it is allocated. As David Ahern's suggestion, this patch is to
      set rt6i_protocol properly in the route when it is installed and
      remove the codes setting rtm_protocol according to rt6i_flags in
      rt6_fill_node.
      
      This is also an improvement to keep rt6i_protocol consistent with
      rtm_protocol.
      
      Fixes: c2ed1880
      
       ("net: ipv6: check route protocol when deleting routes")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Suggested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b91d5329
    • Eric Dumazet's avatar
      net: fix keepalive code vs TCP_FASTOPEN_CONNECT · 2dda6400
      Eric Dumazet authored
      syzkaller was able to trigger a divide by 0 in TCP stack [1]
      
      Issue here is that keepalive timer needs to be updated to not attempt
      to send a probe if the connection setup was deferred using
      TCP_FASTOPEN_CONNECT socket option added in linux-4.11
      
      [1]
       divide error: 0000 [#1] SMP
       CPU: 18 PID: 0 Comm: swapper/18 Not tainted
       task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000
       RIP: 0010:[<ffffffff8409cc0d>]  [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160
       Call Trace:
        <IRQ>
        [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20
        [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0
        [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160
        [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230
        [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0
        [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280
        [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257
        [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0
        [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80
        [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90
        <EOI>
        [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0
        [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0
      
      Tested:
      
      Following packetdrill no longer crashes the kernel
      
      `echo 0 >/proc/sys/net/ipv4/tcp_timestamps`
      
      // Cache warmup: send a Fast Open cookie request
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
         +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
         +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
         +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
       +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop>
         +0 > . 1:1(0) ack 1
         +0 close(3) = 0
         +0 > F. 1:1(0) ack 1
         +0 < F. 1:1(0) ack 2 win 92
         +0 > .  2:2(0) ack 2
      
         +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
         +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
         +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
         +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
       +.01 connect(4, ..., ...) = 0
         +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
         +10 close(4) = 0
      
      `echo 1 >/proc/sys/net/ipv4/tcp_timestamps`
      
      Fixes: 19f6d3f3
      
       ("net/tcp-fastopen: Add new API support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2dda6400
    • David S. Miller's avatar
      Merge tag 'batadv-net-for-davem-20170802' of git://git.open-mesh.org/linux-merge · 4d2bbb0e
      David S. Miller authored
      
      
      Simon Wunderlich says:
      
      ====================
      Here is a batman-adv bugfix:
      
       - fix TT sync flag inconsistency problems, which can lead to excess packets,
         by Linus Luessing
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d2bbb0e
  3. Aug 03, 2017
  4. Aug 02, 2017