Skip to content
  1. Aug 18, 2021
    • Longpeng(Mike)'s avatar
      vsock/virtio: avoid potential deadlock when vsock device remove · 09625c5b
      Longpeng(Mike) authored
      [ Upstream commit 49b0b6ff ]
      
      There's a potential deadlock case when remove the vsock device or
      process the RESET event:
      
        vsock_for_each_connected_socket:
            spin_lock_bh(&vsock_table_lock) ----------- (1)
            ...
                virtio_vsock_reset_sock:
                    lock_sock(sk) --------------------- (2)
            ...
            spin_unlock_bh(&vsock_table_lock)
      
      lock_sock() may do initiative schedule when the 'sk' is owned by
      other thread at the same time, we would receivce a warning message
      that "scheduling while atomic".
      
      Even worse, if the next task (selected by the scheduler) try to
      release a 'sk', it need to request vsock_table_lock and the deadlock
      occur, cause the system into softlockup state.
        Call trace:
         queued_spin_lock_slowpath
         vsock_remove_bound
         vsock_remove_sock
         virtio_transport_release
         __vsock_release
         vsock_release
         __sock_release
         sock_close
         __fput
         ____fput
      
      So we should not require sk_lock in this case, just like the behavior
      in vhost_vsock or vmci.
      
      Fixes: 0ea9e1d3
      
       ("VSOCK: Introduce virtio_transport.ko")
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarLongpeng(Mike) <longpeng2@huawei.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      09625c5b
    • Maximilian Heyne's avatar
      xen/events: Fix race in set_evtchn_to_irq · 128e480a
      Maximilian Heyne authored
      [ Upstream commit 88ca2521
      
       ]
      
      There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
      mapping are lazily allocated in this function. The check whether the row
      is already present and the row initialization is not synchronized. Two
      threads can at the same time allocate a new row for evtchn_to_irq and
      add the irq mapping to the their newly allocated row. One thread will
      overwrite what the other has set for evtchn_to_irq[row] and therefore
      the irq mapping is lost. This will trigger a BUG_ON later in
      bind_evtchn_to_cpu:
      
        INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802
        INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002)
        INFO: nvme nvme77: 1/0/0 default/read/poll queues
        CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
        WARN: invalid opcode: 0000 [#1] SMP NOPTI
        WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
        WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
        WARN: Call Trace:
        WARN:  set_affinity_irq+0x121/0x150
        WARN:  irq_do_set_affinity+0x37/0xe0
        WARN:  irq_setup_affinity+0xf6/0x170
        WARN:  irq_startup+0x64/0xe0
        WARN:  __setup_irq+0x69e/0x740
        WARN:  ? request_threaded_irq+0xad/0x160
        WARN:  request_threaded_irq+0xf5/0x160
        WARN:  ? nvme_timeout+0x2f0/0x2f0 [nvme]
        WARN:  pci_request_irq+0xa9/0xf0
        WARN:  ? pci_alloc_irq_vectors_affinity+0xbb/0x130
        WARN:  queue_request_irq+0x4c/0x70 [nvme]
        WARN:  nvme_reset_work+0x82d/0x1550 [nvme]
        WARN:  ? check_preempt_wakeup+0x14f/0x230
        WARN:  ? check_preempt_curr+0x29/0x80
        WARN:  ? nvme_irq_check+0x30/0x30 [nvme]
        WARN:  process_one_work+0x18e/0x3c0
        WARN:  worker_thread+0x30/0x3a0
        WARN:  ? process_one_work+0x3c0/0x3c0
        WARN:  kthread+0x113/0x130
        WARN:  ? kthread_park+0x90/0x90
        WARN:  ret_from_fork+0x3a/0x50
      
      This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
      will be set only once. The row is now cleared before writing it to
      evtchn_to_irq in order to not create a race once the row is visible for
      other threads.
      
      While at it, do not require the page to be zeroed, because it will be
      overwritten with -1's in clear_evtchn_to_irq_row anyway.
      
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Fixes: d0b075ff ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated")
      Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de
      
      
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      128e480a
    • Matt Roper's avatar
      drm/i915: Only access SFC_DONE when media domain is not fused off · 950429a4
      Matt Roper authored
      [ Upstream commit 24d032e2 ]
      
      The SFC_DONE register lives within the corresponding VD0/VD2/VD4/VD6
      forcewake domain and is not accessible if the vdbox in that domain is
      fused off and the forcewake is not initialized.
      
      This mistake went unnoticed because until recently we were using the
      wrong register offset for the SFC_DONE register; once the register
      offset was corrected, we started hitting errors like
      
        <4> [544.989065] i915 0000:cc:00.0: Uninitialized forcewake domain(s) 0x80 accessed at 0x1ce000
      
      on parts with fused-off vdbox engines.
      
      Fixes: e50dbdbf ("drm/i915/tgl: Add SFC instdone to error state")
      Fixes: 9c9c6d0a
      
       ("drm/i915: Correct SFC_DONE register offset")
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210806174130.1058960-1-matthew.d.roper@intel.com
      
      
      Reviewed-by: default avatarJosé Roberto de Souza <jose.souza@intel.com>
      (cherry picked from commit c5589bb5
      
      )
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      [Changed Fixes tag to match the cherry-picked 82929a21
      
      ]
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      950429a4
    • Eric Dumazet's avatar
      net: igmp: increase size of mr_ifc_count · 9977d0ba
      Eric Dumazet authored
      [ Upstream commit b69dd5b3 ]
      
      Some arches support cmpxchg() on 4-byte and 8-byte only.
      Increase mr_ifc_count width to 32bit to fix this problem.
      
      Fixes: 4a2b285e
      
       ("net: igmp: fix data-race in igmp_ifc_timer_expire()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9977d0ba
    • Neal Cardwell's avatar
      tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets · 43913895
      Neal Cardwell authored
      [ Upstream commit 6de035fe ]
      
      Currently if BBR congestion control is initialized after more than 2B
      packets have been delivered, depending on the phase of the
      tp->delivered counter the tracking of BBR round trips can get stuck.
      
      The bug arises because if tp->delivered is between 2^31 and 2^32 at
      the time the BBR congestion control module is initialized, then the
      initialization of bbr->next_rtt_delivered to 0 will cause the logic to
      believe that the end of the round trip is still billions of packets in
      the future. More specifically, the following check will fail
      repeatedly:
      
        !before(rs->prior_delivered, bbr->next_rtt_delivered)
      
      and thus the connection will take up to 2B packets delivered before
      that check will pass and the connection will set:
      
        bbr->round_start = 1;
      
      This could cause many mechanisms in BBR to fail to trigger, for
      example bbr_check_full_bw_reached() would likely never exit STARTUP.
      
      This bug is 5 years old and has not been observed, and as a practical
      matter this would likely rarely trigger, since it would require
      transferring at least 2B packets, or likely more than 3 terabytes of
      data, before switching congestion control algorithms to BBR.
      
      This patch is a stable candidate for kernels as far back as v4.9,
      when tcp_bbr.c was added.
      
      Fixes: 0f8782ea
      
       ("tcp_bbr: add BBR congestion control")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarKevin Yang <yyd@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      43913895
    • Willy Tarreau's avatar
      net: linkwatch: fix failure to restore device state across suspend/resume · 53201f29
      Willy Tarreau authored
      [ Upstream commit 6922110d ]
      
      After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed
      that my Ethernet port to which a bond and a VLAN interface are attached
      appeared to remain up after resuming from suspend with the cable unplugged
      (and that problem still persists with 5.10-LTS).
      
      It happens that the following happens:
      
        - the network driver (e1000e here) prepares to suspend, calls e1000e_down()
          which calls netif_carrier_off() to signal that the link is going down.
        - netif_carrier_off() adds a link_watch event to the list of events for
          this device
        - the device is completely stopped.
        - the machine suspends
        - the cable is unplugged and the machine brought to another location
        - the machine is resumed
        - the queued linkwatch events are processed for the device
        - the device doesn't yet have the __LINK_STATE_PRESENT bit and its events
          are silently dropped
        - the device is resumed with its link down
        - the upper VLAN and bond interfaces are never notified that the link had
          been turned down and remain up
        - the only way to provoke a change is to physically connect the machine
          to a port and possibly unplug it.
      
      The state after resume looks like this:
        $ ip -br li | egrep 'bond|eth'
        bond0            UP             e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>
        eth0             DOWN           e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP>
        eth0.2@eth0      UP             e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP>
      
      Placing an explicit call to netdev_state_change() either in the suspend
      or the resume code in the NIC driver worked around this but the solution
      is not satisfying.
      
      The issue in fact really is in link_watch that loses events while it
      ought not to. It happens that the test for the device being present was
      added by commit 124eee3f ("net: linkwatch: add check for netdevice
      being present to linkwatch_do_dev") in 4.20 to avoid an access to
      devices that are not present.
      
      Instead of dropping events, this patch proceeds slightly differently by
      postponing their handling so that they happen after the device is fully
      resumed.
      
      Fixes: 124eee3f ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev")
      Link: https://lists.openwall.net/netdev/2018/03/15/62
      
      
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53201f29
    • Yang Yingliang's avatar
      net: bridge: fix memleak in br_add_if() · 59cabc51
      Yang Yingliang authored
      [ Upstream commit 519133de ]
      
      I got a memleak report:
      
      BUG: memory leak
      unreferenced object 0x607ee521a658 (size 240):
      comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s)
      hex dump (first 32 bytes, cpu 1):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      backtrace:
      [<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693
      [<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline]
      [<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611
      [<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline]
      [<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487
      [<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457
      [<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488
      [<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550
      [<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504
      [<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
      [<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340
      [<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929
      [<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline]
      [<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674
      [<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350
      [<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404
      [<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433
      [<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47
      [<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      On error path of br_add_if(), p->mcast_stats allocated in
      new_nbp() need be freed, or it will be leaked.
      
      Fixes: 1080ab95
      
       ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      59cabc51
    • Nikolay Aleksandrov's avatar
      net: bridge: fix flags interpretation for extern learn fdb entries · ff6c9aad
      Nikolay Aleksandrov authored
      [ Upstream commit 45a68787 ]
      
      Ignore fdb flags when adding port extern learn entries and always set
      BR_FDB_LOCAL flag when adding bridge extern learn entries. This is
      closest to the behaviour we had before and avoids breaking any use cases
      which were allowed.
      
      This patch fixes iproute2 calls which assume NUD_PERMANENT and were
      allowed before, example:
      $ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn
      
      Extern learn entries are allowed to roam, but do not expire, so static
      or dynamic flags make no sense for them.
      
      Also add a comment for future reference.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Fixes: 0541a629
      
       ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry")
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ff6c9aad
    • Andre Przywara's avatar
      pinctrl: sunxi: Don't underestimate number of functions · c7c9cc4a
      Andre Przywara authored
      [ Upstream commit d1dee814 ]
      
      When we are building all the various pinctrl structures for the
      Allwinner pinctrl devices, we do some estimation about the maximum
      number of distinct function (names) that we will need.
      
      So far we take the number of pins as an upper bound, even though we
      can actually have up to four special functions per pin. This wasn't a
      problem until now, since we indeed have typically far more pins than
      functions, and most pins share common functions.
      
      However the H616 "-r" pin controller has only two pins, but four
      functions, so we run over the end of the array when we are looking for
      a matching function name in sunxi_pinctrl_add_function - there is no
      NULL sentinel left that would terminate the loop:
      
      [    8.200648] Unable to handle kernel paging request at virtual address fffdff7efbefaff5
      [    8.209179] Mem abort info:
      ....
      [    8.368456] Call trace:
      [    8.370925]  __pi_strcmp+0x90/0xf0
      [    8.374559]  sun50i_h616_r_pinctrl_probe+0x1c/0x28
      [    8.379557]  platform_probe+0x68/0xd8
      
      Do an actual worst case allocation (4 functions per pin, three common
      functions and the sentinel) for the initial array allocation. This is
      now heavily overestimating the number of functions in the common case,
      but we will reallocate this array later with the actual number of
      functions, so it's only temporarily.
      
      Fixes: 561c1cf1
      
       ("pinctrl: sunxi: Add support for the Allwinner H616-R pin controller")
      Signed-off-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Acked-by: default avatarMaxime Ripard <maxime@cerno.tech>
      Link: https://lore.kernel.org/r/20210722132548.22121-1-andre.przywara@arm.com
      
      
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c7c9cc4a
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix broken backpressure in .port_fdb_dump · 735e90f3
      Vladimir Oltean authored
      [ Upstream commit 21b52fed ]
      
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 291d1e72
      
       ("net: dsa: sja1105: Add support for FDB and MDB management")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      735e90f3
    • Vladimir Oltean's avatar
      net: dsa: lantiq: fix broken backpressure in .port_fdb_dump · 8398aab4
      Vladimir Oltean authored
      [ Upstream commit 871a73a1 ]
      
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 58c59ef9
      
       ("net: dsa: lantiq: Add Forwarding Database access")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8398aab4
    • Vladimir Oltean's avatar
      net: dsa: lan9303: fix broken backpressure in .port_fdb_dump · c6cbf567
      Vladimir Oltean authored
      [ Upstream commit ada2fee1 ]
      
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: ab335349
      
       ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c6cbf567
    • Vladimir Oltean's avatar
      net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump · 22ecb342
      Vladimir Oltean authored
      [ Upstream commit cd391280 ]
      
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: e4b27ebc
      
       ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      22ecb342
    • Eric Dumazet's avatar
      net: igmp: fix data-race in igmp_ifc_timer_expire() · 52133524
      Eric Dumazet authored
      [ Upstream commit 4a2b285e ]
      
      Fix the data-race reported by syzbot [1]
      Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count
      while another change just occured from another context.
      
      in_dev->mr_ifc_count is only 8bit wide, so the race had little
      consequences.
      
      [1]
      BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire
      
      write to 0xffff8881051e3062 of 1 bytes by task 12547 on cpu 0:
       igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821
       igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356
       ____ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461
       __ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199
       ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline]
       ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423
       tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657
       sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362
       __sys_setsockopt+0x18f/0x200 net/socket.c:2159
       __do_sys_setsockopt net/socket.c:2170 [inline]
       __se_sys_setsockopt net/socket.c:2167 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2167
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff8881051e3062 of 1 bytes by interrupt on cpu 1:
       igmp_ifc_timer_expire+0x706/0xa30 net/ipv4/igmp.c:808
       call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1419
       expire_timers+0x135/0x250 kernel/time/timer.c:1464
       __run_timers+0x358/0x420 kernel/time/timer.c:1732
       run_timer_softirq+0x19/0x30 kernel/time/timer.c:1745
       __do_softirq+0x12c/0x26e kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x9a/0xb0 kernel/softirq.c:636
       sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1100
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
       console_unlock+0x8e8/0xb30 kernel/printk/printk.c:2646
       vprintk_emit+0x125/0x3d0 kernel/printk/printk.c:2174
       vprintk_default+0x22/0x30 kernel/printk/printk.c:2185
       vprintk+0x15a/0x170 kernel/printk/printk_safe.c:392
       printk+0x62/0x87 kernel/printk/printk.c:2216
       selinux_netlink_send+0x399/0x400 security/selinux/hooks.c:6041
       security_netlink_send+0x42/0x90 security/security.c:2070
       netlink_sendmsg+0x59e/0x7c0 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:703 [inline]
       sock_sendmsg net/socket.c:723 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
       ___sys_sendmsg net/socket.c:2446 [inline]
       __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
       __do_sys_sendmsg net/socket.c:2484 [inline]
       __se_sys_sendmsg net/socket.c:2482 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x01 -> 0x02
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      52133524
    • Takeshi Misawa's avatar
      net: Fix memory leak in ieee802154_raw_deliver · 44c8aa99
      Takeshi Misawa authored
      [ Upstream commit 1090340f ]
      
      If IEEE-802.15.4-RAW is closed before receive skb, skb is leaked.
      Fix this, by freeing sk_receive_queue in sk->sk_destruct().
      
      syzbot report:
      BUG: memory leak
      unreferenced object 0xffff88810f644600 (size 232):
        comm "softirq", pid 0, jiffies 4294967032 (age 81.270s)
        hex dump (first 32 bytes):
          10 7d 4b 12 81 88 ff ff 10 7d 4b 12 81 88 ff ff  .}K......}K.....
          00 00 00 00 00 00 00 00 40 7c 4b 12 81 88 ff ff  ........@|K.....
        backtrace:
          [<ffffffff83651d4a>] skb_clone+0xaa/0x2b0 net/core/skbuff.c:1496
          [<ffffffff83fe1b80>] ieee802154_raw_deliver net/ieee802154/socket.c:369 [inline]
          [<ffffffff83fe1b80>] ieee802154_rcv+0x100/0x340 net/ieee802154/socket.c:1070
          [<ffffffff8367cc7a>] __netif_receive_skb_one_core+0x6a/0xa0 net/core/dev.c:5384
          [<ffffffff8367cd07>] __netif_receive_skb+0x27/0xa0 net/core/dev.c:5498
          [<ffffffff8367cdd9>] netif_receive_skb_internal net/core/dev.c:5603 [inline]
          [<ffffffff8367cdd9>] netif_receive_skb+0x59/0x260 net/core/dev.c:5662
          [<ffffffff83fe6302>] ieee802154_deliver_skb net/mac802154/rx.c:29 [inline]
          [<ffffffff83fe6302>] ieee802154_subif_frame net/mac802154/rx.c:102 [inline]
          [<ffffffff83fe6302>] __ieee802154_rx_handle_packet net/mac802154/rx.c:212 [inline]
          [<ffffffff83fe6302>] ieee802154_rx+0x612/0x620 net/mac802154/rx.c:284
          [<ffffffff83fe59a6>] ieee802154_tasklet_handler+0x86/0xa0 net/mac802154/main.c:35
          [<ffffffff81232aab>] tasklet_action_common.constprop.0+0x5b/0x100 kernel/softirq.c:557
          [<ffffffff846000bf>] __do_softirq+0xbf/0x2ab kernel/softirq.c:345
          [<ffffffff81232f4c>] do_softirq kernel/softirq.c:248 [inline]
          [<ffffffff81232f4c>] do_softirq+0x5c/0x80 kernel/softirq.c:235
          [<ffffffff81232fc1>] __local_bh_enable_ip+0x51/0x60 kernel/softirq.c:198
          [<ffffffff8367a9a4>] local_bh_enable include/linux/bottom_half.h:32 [inline]
          [<ffffffff8367a9a4>] rcu_read_unlock_bh include/linux/rcupdate.h:745 [inline]
          [<ffffffff8367a9a4>] __dev_queue_xmit+0x7f4/0xf60 net/core/dev.c:4221
          [<ffffffff83fe2db4>] raw_sendmsg+0x1f4/0x2b0 net/ieee802154/socket.c:295
          [<ffffffff8363af16>] sock_sendmsg_nosec net/socket.c:654 [inline]
          [<ffffffff8363af16>] sock_sendmsg+0x56/0x80 net/socket.c:674
          [<ffffffff8363deec>] __sys_sendto+0x15c/0x200 net/socket.c:1977
          [<ffffffff8363dfb6>] __do_sys_sendto net/socket.c:1989 [inline]
          [<ffffffff8363dfb6>] __se_sys_sendto net/socket.c:1985 [inline]
          [<ffffffff8363dfb6>] __x64_sys_sendto+0x26/0x30 net/socket.c:1985
      
      Fixes: 9ec76716
      
       ("net: add IEEE 802.15.4 socket family implementation")
      Reported-and-tested-by: default avatar <syzbot+1f68113fa907bf0695a8@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTakeshi Misawa <jeliantsurux@gmail.com>
      Acked-by: default avatarAlexander Aring <aahringo@redhat.com>
      Link: https://lore.kernel.org/r/20210805075414.GA15796@DESKTOP
      
      
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      44c8aa99
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Don't use phy_port_cnt in VLAN table lookup · 74b264b3
      Ben Hutchings authored
      [ Upstream commit 411d466d ]
      
      The magic number 4 in VLAN table lookup was the number of entries we
      can read and write at once.  Using phy_port_cnt here doesn't make
      sense and presumably broke VLAN filtering for 3-port switches.  Change
      it back to 4.
      
      Fixes: 4ce2a984
      
       ("net: dsa: microchip: ksz8795: use phy_port_cnt ...")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      74b264b3
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Fix VLAN filtering · 1c4f2820
      Ben Hutchings authored
      [ Upstream commit 16484413 ]
      
      Currently ksz8_port_vlan_filtering() sets or clears the VLAN Enable
      hardware flag.  That controls discarding of packets with a VID that
      has not been enabled for any port on the switch.
      
      Since it is a global flag, set the dsa_switch::vlan_filtering_is_global
      flag so that the DSA core understands this can't be controlled per
      port.
      
      When VLAN filtering is enabled, the switch should also discard packets
      with a VID that's not enabled on the ingress port.  Set or clear each
      external port's VLAN Ingress Filter flag in ksz8_port_vlan_filtering()
      to make that happen.
      
      Fixes: e66f840c
      
       ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1c4f2820
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Use software untagging on CPU port · 3cc01579
      Ben Hutchings authored
      [ Upstream commit 9130c2d3 ]
      
      On the CPU port, we can support both tagged and untagged VLANs at the
      same time by doing any necessary untagging in software rather than
      hardware.  To enable that, keep the CPU port's Remove Tag flag cleared
      and set the dsa_switch::untag_bridge_pvid flag.
      
      Fixes: e66f840c
      
       ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3cc01579
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Fix VLAN untagged flag change on deletion · 9674dc67
      Ben Hutchings authored
      [ Upstream commit af01754f ]
      
      When a VLAN is deleted from a port, the flags in struct
      switchdev_obj_port_vlan are always 0.  ksz8_port_vlan_del() copies the
      BRIDGE_VLAN_INFO_UNTAGGED flag to the port's Tag Removal flag, and
      therefore always clears it.
      
      In case there are multiple VLANs configured as untagged on this port -
      which seems useless, but is allowed - deleting one of them changes the
      remaining VLANs to be tagged.
      
      It's only ever necessary to change this flag when a VLAN is added to
      the port, so leave it unchanged in ksz8_port_vlan_del().
      
      Fixes: e66f840c
      
       ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9674dc67
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Reject unsupported VLAN configuration · 159948c4
      Ben Hutchings authored
      [ Upstream commit 8f4f58f8 ]
      
      The switches supported by ksz8795 only have a per-port flag for Tag
      Removal.  This means it is not possible to support both tagged and
      untagged VLANs on the same port.  Reject attempts to add a VLAN that
      requires the flag to be changed, unless there are no VLANs currently
      configured.
      
      VID 0 is excluded from this check since it is untagged regardless of
      the state of the flag.
      
      On the CPU port we could support tagged and untagged VLANs at the same
      time.  This will be enabled by a later patch.
      
      Fixes: e66f840c
      
       ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      159948c4
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Fix PVID tag insertion · 3149f9ed
      Ben Hutchings authored
      [ Upstream commit ef3b02a1 ]
      
      ksz8795 has never actually enabled PVID tag insertion, and it also
      programmed the PVID incorrectly.  To fix this:
      
      * Allow tag insertion to be controlled per ingress port.  On most
        chips, set bit 2 in Global Control 19.  On KSZ88x3 this control
        flag doesn't exist.
      
      * When adding a PVID:
        - Set the appropriate register bits to enable tag insertion on
          egress at every other port if this was the packet's ingress port.
        - Mask *out* the VID from the default tag, before or-ing in the new
          PVID.
      
      * When removing a PVID:
        - Clear the same control bits to disable tag insertion.
        - Don't update the default tag.  This wasn't doing anything useful.
      
      Fixes: e66f840c
      
       ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3149f9ed
    • Ben Hutchings's avatar
      net: dsa: microchip: Fix ksz_read64() · 8154453a
      Ben Hutchings authored
      [ Upstream commit c34f674c ]
      
      ksz_read64() currently does some dubious byte-swapping on the two
      halves of a 64-bit register, and then only returns the high bits.
      Replace this with a straightforward expression.
      
      Fixes: e66f840c
      
       ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8154453a
    • Yonghong Song's avatar
      bpf: Fix potentially incorrect results with bpf_get_local_storage() · 037570c9
      Yonghong Song authored
      [ Upstream commit a2baf4e8 ]
      
      Commit b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
      helper") fixed a bug for bpf_get_local_storage() helper so different tasks
      won't mess up with each other's percpu local storage.
      
      The percpu data contains 8 slots so it can hold up to 8 contexts (same or
      different tasks), for 8 different program runs, at the same time. This in
      general is sufficient. But our internal testing showed the following warning
      multiple times:
      
        [...]
        warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193
           __cgroup_bpf_run_filter_sock_ops+0x13e/0x180
        RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
        <IRQ>
         tcp_call_bpf.constprop.99+0x93/0xc0
         tcp_conn_request+0x41e/0xa50
         ? tcp_rcv_state_process+0x203/0xe00
         tcp_rcv_state_process+0x203/0xe00
         ? sk_filter_trim_cap+0xbc/0x210
         ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160
         tcp_v6_do_rcv+0x181/0x3e0
         tcp_v6_rcv+0xc65/0xcb0
         ip6_protocol_deliver_rcu+0xbd/0x450
         ip6_input_finish+0x11/0x20
         ip6_input+0xb5/0xc0
         ip6_sublist_rcv_finish+0x37/0x50
         ip6_sublist_rcv+0x1dc/0x270
         ipv6_list_rcv+0x113/0x140
         __netif_receive_skb_list_core+0x1a0/0x210
         netif_receive_skb_list_internal+0x186/0x2a0
         gro_normal_list.part.170+0x19/0x40
         napi_complete_done+0x65/0x150
         mlx5e_napi_poll+0x1ae/0x680
         __napi_poll+0x25/0x120
         net_rx_action+0x11e/0x280
         __do_softirq+0xbb/0x271
         irq_exit_rcu+0x97/0xa0
         common_interrupt+0x7f/0xa0
         </IRQ>
         asm_common_interrupt+0x1e/0x40
        RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac
         ? __cgroup_bpf_run_filter_skb+0x378/0x4e0
         ? do_softirq+0x34/0x70
         ? ip6_finish_output2+0x266/0x590
         ? ip6_finish_output+0x66/0xa0
         ? ip6_output+0x6c/0x130
         ? ip6_xmit+0x279/0x550
         ? ip6_dst_check+0x61/0xd0
        [...]
      
      Using drgn [0] to dump the percpu buffer contents showed that on this CPU
      slot 0 is still available, but slots 1-7 are occupied and those tasks in
      slots 1-7 mostly don't exist any more. So we might have issues in
      bpf_cgroup_storage_unset().
      
      Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset().
      Currently, it tries to unset "current" slot with searching from the start.
      So the following sequence is possible:
      
        1. A task is running and claims slot 0
        2. Running BPF program is done, and it checked slot 0 has the "task"
           and ready to reset it to NULL (not yet).
        3. An interrupt happens, another BPF program runs and it claims slot 1
           with the *same* task.
        4. The unset() in interrupt context releases slot 0 since it matches "task".
        5. Interrupt is done, the task in process context reset slot 0.
      
      At the end, slot 1 is not reset and the same process can continue to occupy
      slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF
      program won't be able to claim an empty slot and a warning will be issued.
      
      To fix the issue, for unset() function, we should traverse from the last slot
      to the first. This way, the above issue can be avoided.
      
      The same reverse traversal should also be done in bpf_get_local_storage() helper
      itself. Otherwise, incorrect local storage may be returned to BPF program.
      
        [0] https://github.com/osandov/drgn
      
      Fixes: b910eaaa
      
       ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      037570c9
    • Miklos Szeredi's avatar
      ovl: fix deadlock in splice write · 1d1808fa
      Miklos Szeredi authored
      [ Upstream commit 9b91b6b0 ]
      
      There's possibility of an ABBA deadlock in case of a splice write to an
      overlayfs file and a concurrent splice write to a corresponding real file.
      
      The call chain for splice to an overlay file:
      
       -> do_splice                     [takes sb_writers on overlay file]
         -> do_splice_from
           -> iter_file_splice_write    [takes pipe->mutex]
             -> vfs_iter_write
               ...
               -> ovl_write_iter        [takes sb_writers on real file]
      
      And the call chain for splice to a real file:
      
       -> do_splice                     [takes sb_writers on real file]
         -> do_splice_from
           -> iter_file_splice_write    [takes pipe->mutex]
      
      Syzbot successfully bisected this to commit 82a763e6 ("ovl: simplify
      file splice").
      
      Fix by reverting the write part of the above commit and by adding missing
      bits from ovl_write_iter() into ovl_splice_write().
      
      Fixes: 82a763e6
      
       ("ovl: simplify file splice")
      Reported-and-tested-by: default avatar <syzbot+579885d1a9a833336209@syzkaller.appspotmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1d1808fa
    • Christian Hewitt's avatar
      drm/meson: fix colour distortion from HDR set during vendor u-boot · 75004b47
      Christian Hewitt authored
      [ Upstream commit bf33677a ]
      
      Add support for the OSD1 HDR registers so meson DRM can handle the HDR
      properties set by Amlogic u-boot on G12A and newer devices which result
      in blue/green/pink colour distortion to display output.
      
      This takes the original patch submissions from Mathias [0] and [1] with
      corrections for formatting and the missing description and attribution
      needed for merge.
      
      [0] https://lore.kernel.org/linux-amlogic/59dfd7e6-fc91-3d61-04c4-94e078a3188c@baylibre.com/T/
      [1] https://lore.kernel.org/linux-amlogic/CAOKfEHBx_fboUqkENEMd-OC-NSrf46nto+vDLgvgttzPe99kXg@mail.gmail.com/T/#u
      
      Fixes: 72888394
      
       ("drm/meson: Add G12A Support for VIU setup")
      Suggested-by: default avatarMathias Steiger <mathias.steiger@googlemail.com>
      Signed-off-by: default avatarChristian Hewitt <christianshewitt@gmail.com>
      Tested-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Tested-by: default avatarPhilip Milev <milev.philip@gmail.com>
      [narmsrong: adding missing space on second tested-by tag]
      Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210806094005.7136-1-christianshewitt@gmail.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      75004b47
    • Aya Levin's avatar
      net/mlx5: Fix return value from tracer initialization · 11e249ce
      Aya Levin authored
      [ Upstream commit bd37c288 ]
      
      Check return value of mlx5_fw_tracer_start(), set error path and fix
      return value of mlx5_fw_tracer_init() accordingly.
      
      Fixes: c71ad41c
      
       ("net/mlx5: FW tracer, events handling")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      11e249ce
    • Shay Drory's avatar
      net/mlx5: Synchronize correct IRQ when destroying CQ · 436f4a1c
      Shay Drory authored
      [ Upstream commit 563476ae ]
      
      The CQ destroy is performed based on the IRQ number that is stored in
      cq->irqn. That number wasn't set explicitly during CQ creation and as
      expected some of the API users of mlx5_core_create_cq() forgot to update
      it.
      
      This caused to wrong synchronization call of the wrong IRQ with a number
      0 instead of the real one.
      
      As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
      update all users accordingly.
      
      Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
      Fixes: ef1659ad
      
       ("IB/mlx5: Add DEVX support for CQ events")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      436f4a1c
    • Chris Mi's avatar
      net/mlx5e: TC, Fix error handling memory leak · 9b0b9c9d
      Chris Mi authored
      [ Upstream commit 88bbd7b2 ]
      
      Free the offload sample action on error.
      
      Fixes: f94d6389
      
       ("net/mlx5e: TC, Add support to offload sample action")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9b0b9c9d
    • Aya Levin's avatar
      net/mlx5: Block switchdev mode while devlink traps are active · 89163e39
      Aya Levin authored
      [ Upstream commit c85a6b8f ]
      
      Since switchdev mode can't support  devlink traps, verify there are
      no active devlink traps before moving eswitch to switchdev mode. If
      there are active traps, prevent the switchdev mode configuration.
      
      Fixes: eb3862a0
      
       ("net/mlx5e: Enable traps according to link state")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      89163e39
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Destroy page pool after XDP SQ to fix use-after-free · 09ab613d
      Maxim Mikityanskiy authored
      [ Upstream commit 8ba3e4c8 ]
      
      mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to
      free the outstanding descriptors, which relies on
      mlx5e_page_release_dynamic and page_pool_release_page. However,
      page_pool_destroy is already called by this point, because
      mlx5e_close_rq runs before mlx5e_close_xdpsq.
      
      This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and
      mlx5e_close_rq.
      
      The commit cited below started calling page_pool_destroy directly from
      the driver. Previously, the page pool was destroyed under a call_rcu
      from xdp_rxq_info_unreg_mem_model, which would defer the deallocation
      until after the XDPSQ is cleaned up.
      
      Fixes: 1da4bbef
      
       ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      09ab613d
    • Roi Dayan's avatar
      net/mlx5e: Avoid creating tunnel headers for local route · c0cb7d8b
      Roi Dayan authored
      [ Upstream commit c623c95a ]
      
      It could be local and remote are on the same machine and the route
      result will be a local route which will result in creating encap id
      with src/dst mac address of 0.
      
      Fixes: a54e20b4
      
       ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c0cb7d8b
    • Alex Vesker's avatar
      net/mlx5: DR, Add fail on error check on decap · 3f20768c
      Alex Vesker authored
      [ Upstream commit d3875924 ]
      
      While processing encapsulated packet on RX, one of the fields that is
      checked is the inner packet length. If the length as specified in the header
      doesn't match the actual inner packet length, the packet is invalid
      and should be dropped. However, such packet caused the NIC to hang.
      
      This patch turns on a 'fail_on_error' HW bit which allows HW to drop
      such an invalid packet while processing RX packet and trying to decap it.
      
      Fixes: ad17dc8c
      
       ("net/mlx5: DR, Move STEv0 action apply logic")
      Signed-off-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3f20768c
    • Leon Romanovsky's avatar
      net/mlx5: Don't skip subfunction cleanup in case of error in module init · df712c5d
      Leon Romanovsky authored
      [ Upstream commit c633e799 ]
      
      Clean SF resources if mlx5 eth failed to initialize.
      
      Fixes: 1958fc2f
      
       ("net/mlx5: SF, Add auxiliary device driver")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      df712c5d
    • Hao Xu's avatar
      io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker() · f49d4579
      Hao Xu authored
      [ Upstream commit 47cae0c7 ]
      
      There may be cases like:
              A                                 B
      spin_lock(wqe->lock)
      nr_workers is 0
      nr_workers++
      spin_unlock(wqe->lock)
                                           spin_lock(wqe->lock)
                                           nr_wokers is 1
                                           nr_workers++
                                           spin_unlock(wqe->lock)
      create_io_worker()
        acct->worker is 1
                                           create_io_worker()
                                             acct->worker is 1
      
      There should be one worker marked IO_WORKER_F_FIXED, but no one is.
      Fix this by introduce a new agrument for create_io_worker() to indicate
      if it is the first worker.
      
      Fixes: 3d4e4fac
      
       ("io-wq: fix no lock protection of acct->nr_worker")
      Signed-off-by: default avatarHao Xu <haoxu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20210808135434.68667-3-haoxu@linux.alibaba.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f49d4579
    • Hao Xu's avatar
      io-wq: fix bug of creating io-wokers unconditionally · 815a0fe3
      Hao Xu authored
      [ Upstream commit 49e7f0c7 ]
      
      The former patch to add check between nr_workers and max_workers has a
      bug, which will cause unconditionally creating io-workers. That's
      because the result of the check doesn't affect the call of
      create_io_worker(), fix it by bringing in a boolean value for it.
      
      Fixes: 21698274
      
       ("io-wq: fix lack of acct->nr_workers < acct->max_workers judgement")
      Signed-off-by: default avatarHao Xu <haoxu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20210808135434.68667-2-haoxu@linux.alibaba.com
      
      
      [axboe: drop hunk that isn't strictly needed]
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      815a0fe3
    • Guillaume Nault's avatar
      bareudp: Fix invalid read beyond skb's linear data · 3cedeb69
      Guillaume Nault authored
      [ Upstream commit 143a8526 ]
      
      Data beyond the UDP header might not be part of the skb's linear data.
      Use skb_copy_bits() instead of direct access to skb->data+X, so that
      we read the correct bytes even on a fragmented skb.
      
      Fixes: 4b5f6723
      
       ("net: Special handling for IP & MPLS.")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Link: https://lore.kernel.org/r/7741c46545c6ef02e70c80a9b32814b22d9616b3.1628264975.git.gnault@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3cedeb69
    • Roi Dayan's avatar
      psample: Add a fwd declaration for skbuff · ed277fbd
      Roi Dayan authored
      [ Upstream commit beb7f2de ]
      
      Without this there is a warning if source files include psample.h
      before skbuff.h or doesn't include it at all.
      
      Fixes: 6ae0a628
      
       ("net: Introduce psample, a new genetlink channel for packet sampling")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Link: https://lore.kernel.org/r/20210808065242.1522535-1-roid@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ed277fbd
    • Md Fahad Iqbal Polash's avatar
      iavf: Set RSS LUT and key in reset handle path · 792e7591
      Md Fahad Iqbal Polash authored
      [ Upstream commit a7550f8b ]
      
      iavf driver should set RSS LUT and key unconditionally in reset
      path. Currently, the driver does not do that. This patch fixes
      this issue.
      
      Fixes: 2c86ac3c
      
       ("i40evf: create a generic config RSS function")
      Signed-off-by: default avatarMd Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      792e7591
    • Brett Creeley's avatar
      ice: don't remove netdev->dev_addr from uc sync list · f2b15898
      Brett Creeley authored
      [ Upstream commit 3ba7f53f ]
      
      In some circumstances, such as with bridging, it's possible that the
      stack will add the device's own MAC address to its unicast address list.
      
      If, later, the stack deletes this address, the driver will receive a
      request to remove this address.
      
      The driver stores its current MAC address as part of the VSI MAC filter
      list instead of separately. So, this causes a problem when the device's
      MAC address is deleted unexpectedly, which results in traffic failure in
      some cases.
      
      The following configuration steps will reproduce the previously
      mentioned problem:
      
      > ip link set eth0 up
      > ip link add dev br0 type bridge
      > ip link set br0 up
      > ip addr flush dev eth0
      > ip link set eth0 master br0
      > echo 1 > /sys/class/net/br0/bridge/vlan_filtering
      > modprobe -r veth
      > modprobe -r bridge
      > ip addr add 192.168.1.100/24 dev eth0
      
      The following ping command fails due to the netdev->dev_addr being
      deleted when removing the bridge module.
      > ping <link partner>
      
      Fix this by making sure to not delete the netdev->dev_addr during MAC
      address sync. After fixing this issue it was noticed that the
      netdev_warn() in .set_mac was overly verbose, so make it at
      netdev_dbg().
      
      Also, there is a possibility of a race condition between .set_mac and
      .set_rx_mode. Fix this by calling netif_addr_lock_bh() and
      netif_addr_unlock_bh() on the device's netdev when the netdev->dev_addr
      is going to be updated in .set_mac.
      
      Fixes: e94d4478
      
       ("ice: Implement filter sync, NDO operations and bump version")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarLiang Li <liali@redhat.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f2b15898
    • Anirudh Venkataramanan's avatar
      ice: Stop processing VF messages during teardown · 8a081424
      Anirudh Venkataramanan authored
      [ Upstream commit c503e632 ]
      
      When VFs are setup and torn down in quick succession, it is possible
      that a VF is torn down by the PF while the VF's virtchnl requests are
      still in the PF's mailbox ring. Processing the VF's virtchnl request
      when the VF itself doesn't exist results in undefined behavior. Fix
      this by adding a check to stop processing virtchnl requests when VF
      teardown is in progress.
      
      Fixes: ddf30f7f
      
       ("ice: Add handler to configure SR-IOV")
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8a081424