Skip to content
  1. Jul 05, 2019
  2. Jul 04, 2019
  3. Jul 03, 2019
    • Andrei Gherzan's avatar
      configs: arm64/bcm2711: Add MMC_SDHCI_IPROC · 395a6a8b
      Andrei Gherzan authored
      
      
      This driver is used in the device tree for the emmc2 node.
      
      See #3032
      
      Signed-off-by: default avatarAndrei Gherzan <andrei@gherzan.ro>
      395a6a8b
    • Greg Kroah-Hartman's avatar
      Linux 4.19.57 · 1a059243
      Greg Kroah-Hartman authored
      1a059243
    • Jean-Philippe Brucker's avatar
      arm64: insn: Fix ldadd instruction encoding · 3919d91f
      Jean-Philippe Brucker authored
      commit c5e2edeb upstream.
      
      GCC 8.1.0 reports that the ldadd instruction encoding, recently added to
      insn.c, doesn't match the mask and couldn't possibly be identified:
      
       linux/arch/arm64/include/asm/insn.h: In function 'aarch64_insn_is_ldadd':
       linux/arch/arm64/include/asm/insn.h:280:257: warning: bitwise comparison always evaluates to false [-Wtautological-compare]
      
      Bits [31:30] normally encode the size of the instruction (1 to 8 bytes)
      and the current instruction value only encodes the 4- and 8-byte
      variants. At the moment only the BPF JIT needs this instruction, and
      doesn't require the 1- and 2-byte variants, but to be consistent with
      our other ldr and str instruction encodings, clear the size field in the
      insn value.
      
      Fixes: 34b8ab09
      
       ("bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd")
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reported-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3919d91f
    • Thinh Nguyen's avatar
      usb: dwc3: Reset num_trbs after skipping · 9c423fd8
      Thinh Nguyen authored
      commit c7152763 upstream.
      
      Currently req->num_trbs is not reset after the TRBs are skipped and
      processed from the cancelled list. The gadget driver may reuse the
      request with an invalid req->num_trbs, and DWC3 will incorrectly skip
      trbs. To fix this, simply reset req->num_trbs to 0 after skipping
      through all of them.
      
      Fixes: c3acd590
      
       ("usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()")
      Signed-off-by: default avatarThinh Nguyen <thinhn@synopsys.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c423fd8
    • Xin Long's avatar
      tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb · 2bbb6b54
      Xin Long authored
      commit c3bcde02 upstream.
      
      udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel device
      to count packets on dev->tstats, a perpcu variable. However, TIPC is using
      udp tunnel with no tunnel device, and pass the lower dev, like veth device
      that only initializes dev->lstats(a perpcu variable) when creating it.
      
      Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the dev as
      a tunnel device, and uses dev->tstats instead of dev->lstats. tstats' each
      pointer points to a bigger struct than lstats, so when tstats->tx_bytes is
      increased, other percpu variable's members could be overwritten.
      
      syzbot has reported quite a few crashes due to fib_nh_common percpu member
      'nhc_pcpu_rth_output' overwritten, call traces are like:
      
        BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190
        net/ipv4/route.c:1556
          rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556
          __mkroute_output net/ipv4/route.c:2332 [inline]
          ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564
          ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393
          __ip_route_output_key include/net/route.h:125 [inline]
          ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651
          ip_route_output_key include/net/route.h:135 [inline]
        ...
      
      or:
      
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168
          <IRQ>
          rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline]
          free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217
          __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
          rcu_do_batch kernel/rcu/tree.c:2437 [inline]
          invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline]
          rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697
        ...
      
      The issue exists since tunnel stats update is moved to iptunnel_xmit by
      Commit 039f5062
      
       ("ip_tunnel: Move stats update to iptunnel_xmit()"),
      and here to fix it by passing a NULL tunnel dev to udp_tunnel(6)_xmit_skb
      so that the packets counting won't happen on dev->tstats.
      
      Reported-by: default avatar <syzbot+9d4c12bfd45a58738d0a@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+a9e23ea2aa21044c2798@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+c4c4b2bb358bb936ad7e@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+0290d2290a607e035ba1@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+a43d8d4e7e8a7a9e149e@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+a47c5f4c6c00fc1ed16e@syzkaller.appspotmail.com>
      Fixes: 039f5062
      
       ("ip_tunnel: Move stats update to iptunnel_xmit()")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bbb6b54
    • Jason Gunthorpe's avatar
      RDMA: Directly cast the sockaddr union to sockaddr · 89c49e7b
      Jason Gunthorpe authored
      commit 641114d2
      
       upstream.
      
      gcc 9 now does allocation size tracking and thinks that passing the member
      of a union and then accessing beyond that member's bounds is an overflow.
      
      Instead of using the union member, use the entire union with a cast to
      get to the sockaddr. gcc will now know that the memory extends the full
      size of the union.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89c49e7b
    • Will Deacon's avatar
      futex: Update comments and docs about return values of arch futex code · a319c8ff
      Will Deacon authored
      commit 42750351
      
       upstream.
      
      The architecture implementations of 'arch_futex_atomic_op_inuser()' and
      'futex_atomic_cmpxchg_inatomic()' are permitted to return only -EFAULT,
      -EAGAIN or -ENOSYS in the case of failure.
      
      Update the comments in the asm-generic/ implementation and also a stray
      reference in the robust futex documentation.
      
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a319c8ff
    • Daniel Borkmann's avatar
      bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd · 4423a82c
      Daniel Borkmann authored
      commit 34b8ab09
      
       upstream.
      
      Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
      lets add support for STADD and use that in favor of LDXR / STXR loop for
      the XADD mapping if available. STADD is encoded as an alias for LDADD with
      XZR as the destination register, therefore add LDADD to the instruction
      encoder along with STADD as special case and use it in the JIT for CPUs
      that advertise LSE atomics in CPUID register. If immediate offset in the
      BPF XADD insn is 0, then use dst register directly instead of temporary
      one.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4423a82c
    • Will Deacon's avatar
      arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg() · 436869e0
      Will Deacon authored
      commit 8e4e0ac0
      
       upstream.
      
      Returning an error code from futex_atomic_cmpxchg_inatomic() indicates
      that the caller should not make any use of *uval, and should instead act
      upon on the value of the error code. Although this is implemented
      correctly in our futex code, we needlessly copy uninitialised stack to
      *uval in the error case, which can easily be avoided.
      
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      436869e0
    • Martin KaFai Lau's avatar
      bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err · ba6340a7
      Martin KaFai Lau authored
      commit 4ac30c4b upstream.
      
      __udp6_lib_err() may be called when handling icmpv6 message. For example,
      the icmpv6 toobig(type=2).  __udp6_lib_lookup() is then called
      which may call reuseport_select_sock().  reuseport_select_sock() will
      call into a bpf_prog (if there is one).
      
      reuseport_select_sock() is expecting the skb->data pointing to the
      transport header (udphdr in this case).  For example, run_bpf_filter()
      is pulling the transport header.
      
      However, in the __udp6_lib_err() path, the skb->data is pointing to the
      ipv6hdr instead of the udphdr.
      
      One option is to pull and push the ipv6hdr in __udp6_lib_err().
      Instead of doing this, this patch follows how the original
      commit 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
      was done in IPv4, which has passed a NULL skb pointer to
      reuseport_select_sock().
      
      Fixes: 538950a1
      
       ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
      Cc: Craig Gallek <kraig@google.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba6340a7
    • Martin KaFai Lau's avatar
      bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro · 79c6a8c0
      Martin KaFai Lau authored
      commit 257a525f upstream.
      
      When the commit a6024562 ("udp: Add GRO functions to UDP socket")
      added udp[46]_lib_lookup_skb to the udp_gro code path, it broke
      the reuseport_select_sock() assumption that skb->data is pointing
      to the transport header.
      
      This patch follows an earlier __udp6_lib_err() fix by
      passing a NULL skb to avoid calling the reuseport's bpf_prog.
      
      Fixes: a6024562
      
       ("udp: Add GRO functions to UDP socket")
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79c6a8c0
    • Daniel Borkmann's avatar
      bpf: fix unconnected udp hooks · 613bc37f
      Daniel Borkmann authored
      commit 983695fa upstream.
      
      Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently
      to applications as also stated in original motivation in 7828f20e ("Merge
      branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter
      two hooks into Cilium to enable host based load-balancing with Kubernetes,
      I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes
      typically sets up DNS as a service and is thus subject to load-balancing.
      
      Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API
      is currently insufficient and thus not usable as-is for standard applications
      shipped with most distros. To break down the issue we ran into with a simple
      example:
      
        # cat /etc/resolv.conf
        nameserver 147.75.207.207
        nameserver 147.75.207.208
      
      For the purpose of a simple test, we set up above IPs as service IPs and
      transparently redirect traffic to a different DNS backend server for that
      node:
      
        # cilium service list
        ID   Frontend            Backend
        1    147.75.207.207:53   1 => 8.8.8.8:53
        2    147.75.207.208:53   1 => 8.8.8.8:53
      
      The attached BPF program is basically selecting one of the backends if the
      service IP/port matches on the cgroup hook. DNS breaks here, because the
      hooks are not transparent enough to applications which have built-in msg_name
      address checks:
      
        # nslookup 1.1.1.1
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
        [...]
        ;; connection timed out; no servers could be reached
      
        # dig 1.1.1.1
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
        [...]
      
        ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
        ;; global options: +cmd
        ;; connection timed out; no servers could be reached
      
      For comparison, if none of the service IPs is used, and we tell nslookup
      to use 8.8.8.8 directly it works just fine, of course:
      
        # nslookup 1.1.1.1 8.8.8.8
        1.1.1.1.in-addr.arpa	name = one.one.one.one.
      
      In order to fix this and thus act more transparent to the application,
      this needs reverse translation on recvmsg() side. A minimal fix for this
      API is to add similar recvmsg() hooks behind the BPF cgroups static key
      such that the program can track state and replace the current sockaddr_in{,6}
      with the original service IP. From BPF side, this basically tracks the
      service tuple plus socket cookie in an LRU map where the reverse NAT can
      then be retrieved via map value as one example. Side-note: the BPF cgroups
      static key should be converted to a per-hook static key in future.
      
      Same example after this fix:
      
        # cilium service list
        ID   Frontend            Backend
        1    147.75.207.207:53   1 => 8.8.8.8:53
        2    147.75.207.208:53   1 => 8.8.8.8:53
      
      Lookups work fine now:
      
        # nslookup 1.1.1.1
        1.1.1.1.in-addr.arpa    name = one.one.one.one.
      
        Authoritative answers can be found from:
      
        # dig 1.1.1.1
      
        ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550
        ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
      
        ;; OPT PSEUDOSECTION:
        ; EDNS: version: 0, flags:; udp: 512
        ;; QUESTION SECTION:
        ;1.1.1.1.                       IN      A
      
        ;; AUTHORITY SECTION:
        .                       23426   IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400
      
        ;; Query time: 17 msec
        ;; SERVER: 147.75.207.207#53(147.75.207.207)
        ;; WHEN: Tue May 21 12:59:38 UTC 2019
        ;; MSG SIZE  rcvd: 111
      
      And from an actual packet level it shows that we're using the back end
      server when talking via 147.75.207.20{7,8} front end:
      
        # tcpdump -i any udp
        [...]
        12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
        12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
        12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
        12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
        [...]
      
      In order to be flexible and to have same semantics as in sendmsg BPF
      programs, we only allow return codes in [1,1] range. In the sendmsg case
      the program is called if msg->msg_name is present which can be the case
      in both, connected and unconnected UDP.
      
      The former only relies on the sockaddr_in{,6} passed via connect(2) if
      passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar
      way to call into the BPF program whenever a non-NULL msg->msg_name was
      passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note
      that for TCP case, the msg->msg_name is ignored in the regular recvmsg
      path and therefore not relevant.
      
      For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE,
      the hook is not called. This is intentional as it aligns with the same
      semantics as in case of TCP cgroup BPF hooks right now. This might be
      better addressed in future through a different bpf_attach_type such
      that this case can be distinguished from the regular recvmsg paths,
      for example.
      
      Fixes: 1cedee13
      
       ("bpf: Hooks for sys_sendmsg")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarMartynas Pumputis <m@lambda.lt>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      613bc37f
    • Matt Mullins's avatar
      bpf: fix nested bpf tracepoints with per-cpu data · a7177b94
      Matt Mullins authored
      commit 9594dc3c upstream.
      
      BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as
      they do not increment bpf_prog_active while executing.
      
      This enables three levels of nesting, to support
        - a kprobe or raw tp or perf event,
        - another one of the above that irq context happens to call, and
        - another one in nmi context
      (at most one of which may be a kprobe or perf event).
      
      Fixes: 20b9d7ac
      
       ("bpf: avoid excessive stack usage for perf_sample_data")
      Signed-off-by: default avatarMatt Mullins <mmullins@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7177b94
    • Jonathan Lemon's avatar
      bpf: lpm_trie: check left child of last leftmost node for NULL · 4992d4af
      Jonathan Lemon authored
      commit da2577fd upstream.
      
      If the leftmost parent node of the tree has does not have a child
      on the left side, then trie_get_next_key (and bpftool map dump) will
      not look at the child on the right.  This leads to the traversal
      missing elements.
      
      Lookup is not affected.
      
      Update selftest to handle this case.
      
      Reproducer:
      
       bpftool map create /sys/fs/bpf/lpm type lpm_trie key 6 \
           value 1 entries 256 name test_lpm flags 1
       bpftool map update pinned /sys/fs/bpf/lpm key  8 0 0 0  0   0 value 1
       bpftool map update pinned /sys/fs/bpf/lpm key 16 0 0 0  0 128 value 2
       bpftool map dump   pinned /sys/fs/bpf/lpm
      
      Returns only 1 element. (2 expected)
      
      Fixes: b471f2f1
      
       ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE")
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4992d4af
    • Martynas Pumputis's avatar
      bpf: simplify definition of BPF_FIB_LOOKUP related flags · 5e558f9a
      Martynas Pumputis authored
      commit b1d6c15b upstream.
      
      Previously, the BPF_FIB_LOOKUP_{DIRECT,OUTPUT} flags in the BPF UAPI
      were defined with the help of BIT macro. This had the following issues:
      
      - In order to use any of the flags, a user was required to depend
        on <linux/bits.h>.
      - No other flag in bpf.h uses the macro, so it seems that an unwritten
        convention is to use (1 << (nr)) to define BPF-related flags.
      
      Fixes: 87f5fc7e
      
       ("bpf: Provide helper to do forwarding lookups in kernel FIB table")
      Signed-off-by: default avatarMartynas Pumputis <m@lambda.lt>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e558f9a
    • Fei Li's avatar
      tun: wake up waitqueues after IFF_UP is set · 7d2c0ec2
      Fei Li authored
      [ Upstream commit 72b319dc
      
       ]
      
      Currently after setting tap0 link up, the tun code wakes tx/rx waited
      queues up in tun_net_open() when .ndo_open() is called, however the
      IFF_UP flag has not been set yet. If there's already a wait queue, it
      would fail to transmit when checking the IFF_UP flag in tun_sendmsg().
      Then the saving vhost_poll_start() will add the wq into wqh until it
      is waken up again. Although this works when IFF_UP flag has been set
      when tun_chr_poll detects; this is not true if IFF_UP flag has not
      been set at that time. Sadly the latter case is a fatal error, as
      the wq will never be waken up in future unless later manually
      setting link up on purpose.
      
      Fix this by moving the wakeup process into the NETDEV_UP event
      notifying process, this makes sure IFF_UP has been set before all
      waited queues been waken up.
      
      Signed-off-by: default avatarFei Li <lifei.shirley@bytedance.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d2c0ec2
    • Xin Long's avatar
      tipc: check msg->req data len in tipc_nl_compat_bearer_disable · a08b9154
      Xin Long authored
      [ Upstream commit 4f07b80c ]
      
      This patch is to fix an uninit-value issue, reported by syzbot:
      
        BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:981
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x191/0x1f0 lib/dump_stack.c:113
          kmsan_report+0x130/0x2a0 mm/kmsan/kmsan.c:622
          __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:310
          memchr+0xce/0x110 lib/string.c:981
          string_is_valid net/tipc/netlink_compat.c:176 [inline]
          tipc_nl_compat_bearer_disable+0x2a1/0x480 net/tipc/netlink_compat.c:449
          __tipc_nl_compat_doit net/tipc/netlink_compat.c:327 [inline]
          tipc_nl_compat_doit+0x3ac/0xb00 net/tipc/netlink_compat.c:360
          tipc_nl_compat_handle net/tipc/netlink_compat.c:1178 [inline]
          tipc_nl_compat_recv+0x1b1b/0x27b0 net/tipc/netlink_compat.c:1281
      
      TLV_GET_DATA_LEN() may return a negtive int value, which will be
      used as size_t (becoming a big unsigned long) passed into memchr,
      cause this issue.
      
      Similar to what it does in tipc_nl_compat_bearer_enable(), this
      fix is to return -EINVAL when TLV_GET_DATA_LEN() is negtive in
      tipc_nl_compat_bearer_disable(), as well as in
      tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats().
      
      v1->v2:
        - add the missing Fixes tags per Eric's request.
      
      Fixes: 0762216c ("tipc: fix uninit-value in tipc_nl_compat_bearer_enable")
      Fixes: 8b66fee7
      
       ("tipc: fix uninit-value in tipc_nl_compat_link_reset_stats")
      Reported-by: default avatar <syzbot+30eaa8bf392f7fafffaf@syzkaller.appspotmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a08b9154
    • Xin Long's avatar
      tipc: change to use register_pernet_device · fdf3e98e
      Xin Long authored
      [ Upstream commit c492d4c7
      
       ]
      
      This patch is to fix a dst defcnt leak, which can be reproduced by doing:
      
        # ip net a c; ip net a s; modprobe tipc
        # ip net e s ip l a n eth1 type veth peer n eth1 netns c
        # ip net e c ip l s lo up; ip net e c ip l s eth1 up
        # ip net e s ip l s lo up; ip net e s ip l s eth1 up
        # ip net e c ip a a 1.1.1.2/8 dev eth1
        # ip net e s ip a a 1.1.1.1/8 dev eth1
        # ip net e c tipc b e m udp n u1 localip 1.1.1.2
        # ip net e s tipc b e m udp n u1 localip 1.1.1.1
        # ip net d c; ip net d s; rmmod tipc
      
      and it will get stuck and keep logging the error:
      
        unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      The cause is that a dst is held by the udp sock's sk_rx_dst set on udp rx
      path with udp_early_demux == 1, and this dst (eventually holding lo dev)
      can't be released as bearer's removal in tipc pernet .exit happens after
      lo dev's removal, default_device pernet .exit.
      
       "There are two distinct types of pernet_operations recognized: subsys and
        device.  At creation all subsys init functions are called before device
        init functions, and at destruction all device exit functions are called
        before subsys exit function."
      
      So by calling register_pernet_device instead to register tipc_net_ops, the
      pernet .exit() will be invoked earlier than loopback dev's removal when a
      netns is being destroyed, as fou/gue does.
      
      Note that vxlan and geneve udp tunnels don't have this issue, as the udp
      sock is released in their device ndo_stop().
      
      This fix is also necessary for tipc dst_cache, which will hold dsts on tx
      path and I will introduce in my next patch.
      
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdf3e98e
    • YueHaibing's avatar
      team: Always enable vlan tx offload · 32b711f5
      YueHaibing authored
      [ Upstream commit ee429742
      
       ]
      
      We should rather have vlan_tci filled all the way down
      to the transmitting netdevice and let it do the hw/sw
      vlan implementation.
      
      Suggested-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32b711f5
    • Xin Long's avatar
      sctp: change to hold sk after auth shkey is created successfully · eeb770d6
      Xin Long authored
      [ Upstream commit 25bff6d5 ]
      
      Now in sctp_endpoint_init(), it holds the sk then creates auth
      shkey. But when the creation fails, it doesn't release the sk,
      which causes a sk defcnf leak,
      
      Here to fix it by only holding the sk when auth shkey is created
      successfully.
      
      Fixes: a29a5bd4
      
       ("[SCTP]: Implement SCTP-AUTH initializations.")
      Reported-by: default avatar <syzbot+afabda3890cc2f765041@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+276ca1c77a19977c0130@syzkaller.appspotmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eeb770d6
    • Roland Hii's avatar
      net: stmmac: set IC bit when transmitting frames with HW timestamp · 9b7b0aab
      Roland Hii authored
      [ Upstream commit d0bb82fd ]
      
      When transmitting certain PTP frames, e.g. SYNC and DELAY_REQ, the
      PTP daemon, e.g. ptp4l, is polling the driver for the frame transmit
      hardware timestamp. The polling will most likely timeout if the tx
      coalesce is enabled due to the Interrupt-on-Completion (IC) bit is
      not set in tx descriptor for those frames.
      
      This patch will ignore the tx coalesce parameter and set the IC bit
      when transmitting PTP frames which need to report out the frame
      transmit hardware timestamp to user space.
      
      Fixes: f748be53
      
       ("net: stmmac: Rework coalesce timer and fix multi-queue races")
      Signed-off-by: default avatarRoland Hii <roland.king.guan.hii@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b7b0aab
    • Roland Hii's avatar
      net: stmmac: fixed new system time seconds value calculation · a373bf72
      Roland Hii authored
      [ Upstream commit a1e5388b ]
      
      When ADDSUB bit is set, the system time seconds field is calculated as
      the complement of the seconds part of the update value.
      
      For example, if 3.000000001 seconds need to be subtracted from the
      system time, this field is calculated as
      2^32 - 3 = 4294967296 - 3 = 0x100000000 - 3 = 0xFFFFFFFD
      
      Previously, the 0x100000000 is mistakenly written as 100000000.
      
      This is further simplified from
        sec = (0x100000000ULL - sec);
      to
        sec = -sec;
      
      Fixes: ba1ffd74
      
       ("stmmac: fix PTP support for GMAC4")
      Signed-off-by: default avatarRoland Hii <roland.king.guan.hii@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a373bf72
    • JingYi Hou's avatar
      net: remove duplicate fetch in sock_getsockopt · 7d76fc21
      JingYi Hou authored
      [ Upstream commit d0bae4a0
      
       ]
      
      In sock_getsockopt(), 'optlen' is fetched the first time from userspace.
      'len < 0' is then checked. Then in condition 'SO_MEMINFO', 'optlen' is
      fetched the second time from userspace.
      
      If change it between two fetches may cause security problems or unexpected
      behaivor, and there is no reason to fetch it a second time.
      
      To fix this, we need to remove the second fetch.
      
      Signed-off-by: default avatarJingYi Hou <houjingyi647@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d76fc21
    • Eric Dumazet's avatar
      net/packet: fix memory leak in packet_set_ring() · 05dceb60
      Eric Dumazet authored
      [ Upstream commit 55655e3d ]
      
      syzbot found we can leak memory in packet_set_ring(), if user application
      provides buggy parameters.
      
      Fixes: 7f953ab2
      
       ("af_packet: TX_RING support for TPACKET_V3")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05dceb60
    • Stephen Suryaputra's avatar
      ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop · 7c92f3ef
      Stephen Suryaputra authored
      [ Upstream commit 38c73529 ]
      
      In commit 19e4e768 ("ipv4: Fix raw socket lookup for local
      traffic"), the dif argument to __raw_v4_lookup() is coming from the
      returned value of inet_iif() but the change was done only for the first
      lookup. Subsequent lookups in the while loop still use skb->dev->ifIndex.
      
      Fixes: 19e4e768
      
       ("ipv4: Fix raw socket lookup for local traffic")
      Signed-off-by: default avatarStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c92f3ef
    • YueHaibing's avatar
      bonding: Always enable vlan tx offload · 0f345172
      YueHaibing authored
      [ Upstream commit 30d8177e ]
      
      We build vlan on top of bonding interface, which vlan offload
      is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is
      BOND_XMIT_POLICY_ENCAP34.
      
      Because vlan tx offload is off, vlan tci is cleared and skb push
      the vlan header in validate_xmit_vlan() while sending from vlan
      devices. Then in bond_xmit_hash, __skb_flow_dissect() fails to
      get information from protocol headers encapsulated within vlan,
      because 'nhoff' is points to IP header, so bond hashing is based
      on layer 2 info, which fails to distribute packets across slaves.
      
      This patch always enable bonding's vlan tx offload, pass the vlan
      packets to the slave devices with vlan tci, let them to handle
      vlan implementation.
      
      Fixes: 278339a4
      
       ("bonding: propogate vlan_features to bonding master")
      Suggested-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f345172
    • Neil Horman's avatar
      af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET · a4709127
      Neil Horman authored
      [ Upstream commit 89ed5b51
      
       ]
      
      When an application is run that:
      a) Sets its scheduler to be SCHED_FIFO
      and
      b) Opens a memory mapped AF_PACKET socket, and sends frames with the
      MSG_DONTWAIT flag cleared, its possible for the application to hang
      forever in the kernel.  This occurs because when waiting, the code in
      tpacket_snd calls schedule, which under normal circumstances allows
      other tasks to run, including ksoftirqd, which in some cases is
      responsible for freeing the transmitted skb (which in AF_PACKET calls a
      destructor that flips the status bit of the transmitted frame back to
      available, allowing the transmitting task to complete).
      
      However, when the calling application is SCHED_FIFO, its priority is
      such that the schedule call immediately places the task back on the cpu,
      preventing ksoftirqd from freeing the skb, which in turn prevents the
      transmitting task from detecting that the transmission is complete.
      
      We can fix this by converting the schedule call to a completion
      mechanism.  By using a completion queue, we force the calling task, when
      it detects there are no more frames to send, to schedule itself off the
      cpu until such time as the last transmitted skb is freed, allowing
      forward progress to be made.
      
      Tested by myself and the reporter, with good results
      
      Change Notes:
      
      V1->V2:
      	Enhance the sleep logic to support being interruptible and
      allowing for honoring to SK_SNDTIMEO (Willem de Bruijn)
      
      V2->V3:
      	Rearrage the point at which we wait for the completion queue, to
      avoid needing to check for ph/skb being null at the end of the loop.
      Also move the complete call to the skb destructor to avoid needing to
      modify __packet_set_status.  Also gate calling complete on
      packet_read_pending returning zero to avoid multiple calls to complete.
      (Willem de Bruijn)
      
      	Move timeo computation within loop, to re-fetch the socket
      timeout since we also use the timeo variable to record the return code
      from the wait_for_complete call (Neil Horman)
      
      V3->V4:
      	Willem has requested that the control flow be restored to the
      previous state.  Doing so lets us eliminate the need for the
      po->wait_on_complete flag variable, and lets us get rid of the
      packet_next_frame function, but introduces another complexity.
      Specifically, but using the packet pending count, we can, if an
      applications calls sendmsg multiple times with MSG_DONTWAIT set, each
      set of transmitted frames, when complete, will cause
      tpacket_destruct_skb to issue a complete call, for which there will
      never be a wait_on_completion call.  This imbalance will lead to any
      future call to wait_for_completion here to return early, when the frames
      they sent may not have completed.  To correct this, we need to re-init
      the completion queue on every call to tpacket_snd before we enter the
      loop so as to ensure we wait properly for the frames we send in this
      iteration.
      
      	Change the timeout and interrupted gotos to out_put rather than
      out_status so that we don't try to free a non-existant skb
      	Clean up some extra newlines (Willem de Bruijn)
      
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a4709127
    • Wang Xin's avatar
      eeprom: at24: fix unexpected timeout under high load · 64032e2d
      Wang Xin authored
      commit 9a9e295e
      
       upstream.
      
      Within at24_loop_until_timeout the timestamp used for timeout checking
      is recorded after the I2C transfer and sleep_range(). Under high CPU
      load either the execution time for I2C transfer or sleep_range() could
      actually be larger than the timeout value. Worst case the I2C transfer
      is only tried once because the loop will exit due to the timeout
      although the EEPROM is now ready.
      
      To fix this issue the timestamp is recorded at the beginning of each
      iteration. That is, before I2C transfer and sleep. Then the timeout
      is actually checked against the timestamp of the previous iteration.
      This makes sure that even if the timeout is reached, there is still one
      more chance to try the I2C transfer in case the EEPROM is ready.
      
      Example:
      
      If you have a system which combines high CPU load with repeated EEPROM
      writes you will run into the following scenario.
      
       - System makes a successful regmap_bulk_write() to EEPROM.
       - System wants to perform another write to EEPROM but EEPROM is still
         busy with the last write.
       - Because of high CPU load the usleep_range() will sleep more than
         25 ms (at24_write_timeout).
       - Within the over-long sleeping the EEPROM finished the previous write
         operation and is ready again.
       - at24_loop_until_timeout() will detect timeout and won't try to write.
      
      Signed-off-by: default avatarWang Xin <xin.wang7@cn.bosch.com>
      Signed-off-by: default avatarMark Jonas <mark.jonas@de.bosch.com>
      Signed-off-by: default avatarBartosz Golaszewski <brgl@bgdev.pl>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64032e2d