Skip to content
  1. Mar 06, 2021
  2. Mar 05, 2021
    • Brendan Jackman's avatar
      bpf: Explicitly zero-extend R0 after 32-bit cmpxchg · 39491867
      Brendan Jackman authored
      As pointed out by Ilya and explained in the new comment, there's a
      discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
      the value from memory into r0, while x86 only does so when r0 and the
      value in memory are different. The same issue affects s390.
      
      At first this might sound like pure semantics, but it makes a real
      difference when the comparison is 32-bit, since the load will
      zero-extend r0/rax.
      
      The fix is to explicitly zero-extend rax after doing such a
      CMPXCHG. Since this problem affects multiple archs, this is done in
      the verifier by patching in a BPF_ZEXT_REG instruction after every
      32-bit cmpxchg. Any archs that don't need such manual zero-extension
      can do a look-ahead with insn_is_zext to skip the unnecessary mov.
      
      Note this still goes on top of Ilya's patch:
      
      https://lore.kernel.org/bpf/20210301154019.129110-1-iii@linux.ibm.com/T/#u
      
      Differences v5->v6[1]:
       - Moved is_cmpxchg_insn and ensured it can be safely re-used. Also renamed it
         and removed 'inline' to match the style of the is_*_function helpers.
       - Fixed up comments in verifier test (thanks for the careful review, Martin!)
      
      Differences v4->v5[1]:
       - Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin
         for suggesting this.
      
      Differences v3->v4[1]:
       - Moved the optimization against pointless zext into the correct place:
         opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.
      
      Differences v2->v3[1]:
       - Moved patching into fixup_bpf_calls (patch incoming to rename this function)
       - Added extra commentary on bpf_jit_needs_zext
       - Added check to avoid adding a pointless zext(r0) if there's already one there.
      
      Difference v1->v2[1]: Now solved centrally in the verifier instead of
        specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions!
      
      [1] v5: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
          v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
          v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
          v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
          v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t
      
      
      
      Reported-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Fixes: 5ffa2550
      
       ("bpf: Add instructions for atomic_[cmp]xchg")
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Tested-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      39491867
    • Paul Moore's avatar
      cipso,calipso: resolve a number of problems with the DOI refcounts · ad5d07f4
      Paul Moore authored
      The current CIPSO and CALIPSO refcounting scheme for the DOI
      definitions is a bit flawed in that we:
      
      1. Don't correctly match gets/puts in netlbl_cipsov4_list().
      2. Decrement the refcount on each attempt to remove the DOI from the
         DOI list, only removing it from the list once the refcount drops
         to zero.
      
      This patch fixes these problems by adding the missing "puts" to
      netlbl_cipsov4_list() and introduces a more conventional, i.e.
      not-buggy, refcounting mechanism to the DOI definitions.  Upon the
      addition of a DOI to the DOI list, it is initialized with a refcount
      of one, removing a DOI from the list removes it from the list and
      drops the refcount by one; "gets" and "puts" behave as expected with
      respect to refcounts, increasing and decreasing the DOI's refcount by
      one.
      
      Fixes: b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
      Fixes: d7cce015
      
       ("netlabel: Add support for removing a CALIPSO DOI.")
      Reported-by: default avatar <syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad5d07f4
    • Jiri Wiesner's avatar
      ibmvnic: always store valid MAC address · 67eb2114
      Jiri Wiesner authored
      The last change to ibmvnic_set_mac(), 8fc3672a, meant to prevent
      users from setting an invalid MAC address on an ibmvnic interface
      that has not been brought up yet. The change also prevented the
      requested MAC address from being stored by the adapter object for an
      ibmvnic interface when the state of the ibmvnic interface is
      VNIC_PROBED - that is after probing has finished but before the
      ibmvnic interface is brought up. The MAC address stored by the
      adapter object is used and sent to the hypervisor for checking when
      an ibmvnic interface is brought up.
      
      The ibmvnic driver ignoring the requested MAC address when in
      VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
      than one slave to malfunction. The bonding code must be able to
      change the MAC address of its slaves before they are brought up
      during enslaving. The inability of kernels with 8fc3672a to set
      the MAC addresses of bonding slaves is observable in the output of
      "ip address show". The MAC addresses of the slaves are the same as
      the MAC address of the bond on a working system whereas the slaves
      retain their original MAC addresses on a system with a malfunctioning
      LACP bond.
      
      Fixes: 8fc3672a
      
       ("ibmvnic: fix ibmvnic_set_mac")
      Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67eb2114
    • Hillf Danton's avatar
      netdevsim: init u64 stats for 32bit hardware · 863a42b2
      Hillf Danton authored
      Init the u64 stats in order to avoid the lockdep prints on the 32bit
      hardware like
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
       Hardware name: ARM-Versatile Express
       Backtrace:
       [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
       [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
       [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
       [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
       [<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
       [<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
       [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
       [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
       [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
       [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
       [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
       [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
       [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
       [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
       [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
       [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
       [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
       [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
       [<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
       [<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
       [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
       [<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
       [<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
       [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
       [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
       [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
       [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
       [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
       [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
       [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
       [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
       [<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
       [<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
       [<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
      
      Fixes: 83c9e13a
      
       ("netdevsim: add software driver for testing offloads")
      Reported-by: default avatar <syzbot+e74a6857f2d0efe3ad81@syzkaller.appspotmail.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarHillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      863a42b2
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · bdda7dfa
      David S. Miller authored
      
      
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for v5.12
      
      These patches from the MPTCP tree fix a few multipath TCP issues:
      
      Patches 1 and 5 clear some stale pointers when subflows close.
      
      Patches 2, 4, and 9 plug some memory leaks.
      
      Patch 3 fixes a memory accounting error identified by syzkaller.
      
      Patches 6 and 7 fix a race condition that slowed data transmission.
      
      Patch 8 adds missing wakeups when write buffer space is freed.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdda7dfa
    • Geliang Tang's avatar
      mptcp: free resources when the port number is mismatched · 9238e900
      Geliang Tang authored
      When the port number is mismatched with the announced ones, use
      'goto dispose_child' to free the resources instead of using 'goto out'.
      
      This patch also moves the port number checking code in
      subflow_syn_recv_sock before mptcp_finish_join, otherwise subflow_drop_ctx
      will fail in dispose_child.
      
      Fixes: 5bc56388
      
       ("mptcp: add port number check for MP_JOIN")
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9238e900
    • Paolo Abeni's avatar
      mptcp: fix missing wakeup · 417789df
      Paolo Abeni authored
      __mptcp_clean_una() can free write memory and should wake-up
      user-space processes when needed.
      
      When such function is invoked by the MPTCP receive path, the wakeup
      is not needed, as the TCP stack will later trigger subflow_write_space
      which will do the wakeup as needed.
      
      Other __mptcp_clean_una() call sites need an additional wakeup check
      Let's bundle the relevant code in a new helper and use it.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/165
      Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks")
      Fixes: 64b9cea7
      
       ("mptcp: fix spurious retransmissions")
      Tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      417789df
    • Paolo Abeni's avatar
      mptcp: fix race in release_cb · c2e6048f
      Paolo Abeni authored
      If we receive a MPTCP_PUSH_PENDING even from a subflow when
      mptcp_release_cb() is serving the previous one, the latter
      will be delayed up to the next release_sock(msk).
      
      Address the issue implementing a test/serve loop for such
      event.
      
      Additionally rename the push helper to __mptcp_push_pending()
      to be more consistent with the existing code.
      
      Fixes: 6e628cd3
      
       ("mptcp: use mptcp release_cb for delayed tasks")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2e6048f
    • Paolo Abeni's avatar
      mptcp: factor out __mptcp_retrans helper() · 2948d0a1
      Paolo Abeni authored
      
      
      Will simplify the following patch, no functional change
      intended.
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2948d0a1
    • Florian Westphal's avatar
      mptcp: reset 'first' and ack_hint on subflow close · c8fe62f0
      Florian Westphal authored
      
      
      Just like with last_snd, we have to NULL 'first' on subflow close.
      
      ack_hint isn't strictly required (its never dereferenced), but better to
      clear this explicitly as well instead of making it an exception.
      
      msk->first is dereferenced unconditionally at accept time, but
      at that point the ssk is not on the conn_list yet -- this means
      worker can't see it when iterating the conn_list.
      
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8fe62f0
    • Florian Westphal's avatar
      mptcp: dispose initial struct socket when its subflow is closed · 17aee05d
      Florian Westphal authored
      Christoph Paasch reported following crash:
      dst_release underflow
      WARNING: CPU: 0 PID: 1319 at net/core/dst.c:175 dst_release+0xc1/0xd0 net/core/dst.c:175
      CPU: 0 PID: 1319 Comm: syz-executor217 Not tainted 5.11.0-rc6af8e85128b4d0d24083c5cac646e891227052e0c #70
      Call Trace:
       rt_cache_route+0x12e/0x140 net/ipv4/route.c:1503
       rt_set_nexthop.constprop.0+0x1fc/0x590 net/ipv4/route.c:1612
       __mkroute_output net/ipv4/route.c:2484 [inline]
      ...
      
      The worker leaves msk->subflow alone even when it
      happened to close the subflow ssk associated with it.
      
      Fixes: 866f26f2 ("mptcp: always graft subflow socket to parent")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/157
      
      
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17aee05d
    • Paolo Abeni's avatar
      mptcp: fix memory accounting on allocation error · eaeef1ce
      Paolo Abeni authored
      
      
      In case of memory pressure the MPTCP xmit path keeps
      at most a single skb in the tx cache, eventually freeing
      additional ones.
      
      The associated counter for forward memory is not update
      accordingly, and that causes the following splat:
      
      WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
      Modules linked in:
      CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: events mptcp_worker
      RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
      Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0
      RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003
      RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7
      R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100
      R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85
      FS:  0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0
      Call Trace:
       __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547
       mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272
       process_one_work+0x896/0x1170 kernel/workqueue.c:2275
       worker_thread+0x605/0x1350 kernel/workqueue.c:2421
       kthread+0x344/0x410 kernel/kthread.c:292
       ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296
      
      At close time, as reported by syzkaller/Christoph.
      
      This change address the issue properly updating the fwd
      allocated memory counter in the error path.
      
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/136
      Fixes: 724cfd2e
      
       ("mptcp: allocate TX skbs in msk context")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eaeef1ce
    • Florian Westphal's avatar
      mptcp: put subflow sock on connect error · f0715779
      Florian Westphal authored
      mptcp_add_pending_subflow() performs a sock_hold() on the subflow,
      then adds the subflow to the join list.
      
      Without a sock_put the subflow sk won't be freed in case connect() fails.
      
      unreferenced object 0xffff88810c03b100 (size 3000):
      [..]
          sk_prot_alloc.isra.0+0x2f/0x110
          sk_alloc+0x5d/0xc20
          inet6_create+0x2b7/0xd30
          __sock_create+0x17f/0x410
          mptcp_subflow_create_socket+0xff/0x9c0
          __mptcp_subflow_connect+0x1da/0xaf0
          mptcp_pm_nl_work+0x6e0/0x1120
          mptcp_worker+0x508/0x9a0
      
      Fixes: 5b950ff4
      
       ("mptcp: link MPC subflow into msk only after accept")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0715779
    • Florian Westphal's avatar
      mptcp: reset last_snd on subflow close · e0be4931
      Florian Westphal authored
      Send logic caches last active subflow in the msk, so it needs to be
      cleared when the cached subflow is closed.
      
      Fixes: d5f49190 ("mptcp: allow picking different xmit subflows")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/155
      
      
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0be4931
    • Maximilian Heyne's avatar
      net: sched: avoid duplicates in classes dump · bfc25605
      Maximilian Heyne authored
      This is a follow up of commit ea327469 ("net: sched: avoid
      duplicates in qdisc dump") which has fixed the issue only for the qdisc
      dump.
      
      The duplicate printing also occurs when dumping the classes via
        tc class show dev eth0
      
      Fixes: 59cc1f61
      
       ("net: sched: convert qdisc linked list to hashtable")
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfc25605
    • Daniele Palmas's avatar
      net: usb: qmi_wwan: allow qmimux add/del with master up · 6c59cff3
      Daniele Palmas authored
      There's no reason for preventing the creation and removal
      of qmimux network interfaces when the underlying interface
      is up.
      
      This makes qmi_wwan mux implementation more similar to the
      rmnet one, simplifying userspace management of the same
      logical interfaces.
      
      Fixes: c6adf779
      
       ("net: usb: qmi_wwan: add qmap mux protocol support")
      Reported-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c59cff3
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix ucast/bcast flooding always remaining enabled · 6a5166e0
      Vladimir Oltean authored
      In the blamed patch I managed to introduce a bug while moving code
      around: the same logic is applied to the ucast_egress_floods and
      bcast_egress_floods variables both on the "if" and the "else" branches.
      
      This is clearly an unintended change compared to how the code used to be
      prior to that bugfix, so restore it.
      
      Fixes: 7f7ccdea
      
       ("net: dsa: sja1105: fix leakage of flooded frames outside bridging domain")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a5166e0
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10 · 053d8ad1
      Vladimir Oltean authored
      When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
      read before resetting the switch so it can be reprogrammed afterwards.
      This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
      because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
      therefore that last branch is dead code.
      
      Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
      remove the check for SPEED_10, let it fall into the default case.
      
      Fixes: ffe10e67
      
       ("net: dsa: sja1105: Add support for the SGMII port")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      053d8ad1
    • Vladimir Oltean's avatar
      net: mscc: ocelot: properly reject destination IP keys in VCAP IS1 · f1becbed
      Vladimir Oltean authored
      An attempt is made to warn the user about the fact that VCAP IS1 cannot
      offload keys matching on destination IP (at least given the current half
      key format), but sadly that warning fails miserably in practice, due to
      the fact that it operates on an uninitialized "match" variable. We must
      first decode the keys from the flow rule.
      
      Fixes: 75944fda
      
       ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1")
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1becbed
    • David S. Miller's avatar
      Merge branch 'nexthop-blackhole' · 87e5e094
      David S. Miller authored
      
      
      Ido Schimmel says:
      
      ====================
      nexthop: Do not flush blackhole nexthops when loopback goes down
      
      Patch #1 prevents blackhole nexthops from being flushed when the
      loopback device goes down given that as far as user space is concerned,
      these nexthops do not have a nexthop device.
      
      Patch #2 adds a test case.
      
      There are no regressions in fib_nexthops.sh with this change:
      
       # ./fib_nexthops.sh
       ...
       Tests passed: 165
       Tests failed:   0
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87e5e094
    • Ido Schimmel's avatar
      selftests: fib_nexthops: Test blackhole nexthops when loopback goes down · 3a1099d3
      Ido Schimmel authored
      
      
      Test that blackhole nexthops are not flushed when the loopback device
      goes down.
      
      Output without previous patch:
      
       # ./fib_nexthops.sh -t basic
      
       Basic functional tests
       ----------------------
       TEST: List with nothing defined                                     [ OK ]
       TEST: Nexthop get on non-existent id                                [ OK ]
       TEST: Nexthop with no device or gateway                             [ OK ]
       TEST: Nexthop with down device                                      [ OK ]
       TEST: Nexthop with device that is linkdown                          [ OK ]
       TEST: Nexthop with device only                                      [ OK ]
       TEST: Nexthop with duplicate id                                     [ OK ]
       TEST: Blackhole nexthop                                             [ OK ]
       TEST: Blackhole nexthop with other attributes                       [ OK ]
       TEST: Blackhole nexthop with loopback device down                   [FAIL]
       TEST: Create group                                                  [ OK ]
       TEST: Create group with blackhole nexthop                           [FAIL]
       TEST: Create multipath group where 1 path is a blackhole            [ OK ]
       TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
       TEST: Create group with non-existent nexthop                        [ OK ]
       TEST: Create group with same nexthop multiple times                 [ OK ]
       TEST: Replace nexthop with nexthop group                            [ OK ]
       TEST: Replace nexthop group with nexthop                            [ OK ]
       TEST: Nexthop group and device                                      [ OK ]
       TEST: Test proto flush                                              [ OK ]
       TEST: Nexthop group and blackhole                                   [ OK ]
      
       Tests passed:  19
       Tests failed:   2
      
      Output with previous patch:
      
       # ./fib_nexthops.sh -t basic
      
       Basic functional tests
       ----------------------
       TEST: List with nothing defined                                     [ OK ]
       TEST: Nexthop get on non-existent id                                [ OK ]
       TEST: Nexthop with no device or gateway                             [ OK ]
       TEST: Nexthop with down device                                      [ OK ]
       TEST: Nexthop with device that is linkdown                          [ OK ]
       TEST: Nexthop with device only                                      [ OK ]
       TEST: Nexthop with duplicate id                                     [ OK ]
       TEST: Blackhole nexthop                                             [ OK ]
       TEST: Blackhole nexthop with other attributes                       [ OK ]
       TEST: Blackhole nexthop with loopback device down                   [ OK ]
       TEST: Create group                                                  [ OK ]
       TEST: Create group with blackhole nexthop                           [ OK ]
       TEST: Create multipath group where 1 path is a blackhole            [ OK ]
       TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
       TEST: Create group with non-existent nexthop                        [ OK ]
       TEST: Create group with same nexthop multiple times                 [ OK ]
       TEST: Replace nexthop with nexthop group                            [ OK ]
       TEST: Replace nexthop group with nexthop                            [ OK ]
       TEST: Nexthop group and device                                      [ OK ]
       TEST: Test proto flush                                              [ OK ]
       TEST: Nexthop group and blackhole                                   [ OK ]
      
       Tests passed:  21
       Tests failed:   0
      
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a1099d3
    • Ido Schimmel's avatar
      nexthop: Do not flush blackhole nexthops when loopback goes down · 76c03bf8
      Ido Schimmel authored
      As far as user space is concerned, blackhole nexthops do not have a
      nexthop device and therefore should not be affected by the
      administrative or carrier state of any netdev.
      
      However, when the loopback netdev goes down all the blackhole nexthops
      are flushed. This happens because internally the kernel associates
      blackhole nexthops with the loopback netdev.
      
      This behavior is both confusing to those not familiar with kernel
      internals and also diverges from the legacy API where blackhole IPv4
      routes are not flushed when the loopback netdev goes down:
      
       # ip route add blackhole 198.51.100.0/24
       # ip link set dev lo down
       # ip route show 198.51.100.0/24
       blackhole 198.51.100.0/24
      
      Blackhole IPv6 routes are flushed, but at least user space knows that
      they are associated with the loopback netdev:
      
       # ip -6 route show 2001:db8:1::/64
       blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium
      
      Fix this by only flushing blackhole nexthops when the loopback netdev is
      unregistered.
      
      Fixes: ab84be7e
      
       ("net: Initial nexthop code")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reported-by: default avatarDonald Sharp <sharpd@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76c03bf8
    • Drew Fustini's avatar
      net: sctp: trivial: fix typo in comment · d93ef301
      Drew Fustini authored
      
      
      Fix typo of 'overflow' for comment in sctp_tsnmap_check().
      
      Reported-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDrew Fustini <drew@beagleboard.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d93ef301
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · e216674a
      David S. Miller authored
      
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-03-03
      
      This series contains updates to ixgbe and ixgbevf drivers.
      
      Bartosz Golaszewski does not error on -ENODEV from ixgbe_mii_bus_init()
      as this is valid for some devices with a shared bus for ixgbe.
      
      Antony Antony adds a check to fail for non transport mode SA with
      offload as this is not supported for ixgbe and ixgbevf.
      
      Dinghao Liu fixes a memory leak on failure to program a perfect filter
      for ixgbe.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e216674a
    • Dinghao Liu's avatar
      ixgbe: Fix memleak in ixgbe_configure_clsu32 · 7a766381
      Dinghao Liu authored
      
      
      When ixgbe_fdir_write_perfect_filter_82599() fails,
      input allocated by kzalloc() has not been freed,
      which leads to memleak.
      
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7a766381
    • Antony Antony's avatar
      ixgbe: fail to create xfrm offload of IPsec tunnel mode SA · d785e1fe
      Antony Antony authored
      Based on talks and indirect references ixgbe IPsec offlod do not
      support IPsec tunnel mode offload. It can only support IPsec transport
      mode offload. Now explicitly fail when creating non transport mode SA
      with offload to avoid false performance expectations.
      
      Fixes: 63a67fe2
      
       ("ixgbe: add ipsec offload add and remove SA")
      Signed-off-by: default avatarAntony Antony <antony@phenome.org>
      Acked-by: default avatarShannon Nelson <snelson@pensando.io>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d785e1fe