Skip to content
  1. Mar 27, 2024
    • Felix Maurer's avatar
      hsr: Handle failures in module init · 770e3ab9
      Felix Maurer authored
      [ Upstream commit 3cf28cd4 ]
      
      A failure during registration of the netdev notifier was not handled at
      all. A failure during netlink initialization did not unregister the netdev
      notifier.
      
      Handle failures of netdev notifier registration and netlink initialization.
      Both functions should only return negative values on failure and thereby
      lead to the hsr module not being loaded.
      
      Fixes: f421436a
      
       ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Signed-off-by: default avatarFelix Maurer <fmaurer@redhat.com>
      Reviewed-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Reviewed-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://lore.kernel.org/r/3ce097c15e3f7ace98fc7fd9bcbf299f092e63d1.1710504184.git.fmaurer@redhat.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      770e3ab9
    • Yewon Choi's avatar
      rds: introduce acquire/release ordering in acquire/release_in_xmit() · d792459a
      Yewon Choi authored
      [ Upstream commit 1422f288 ]
      
      acquire/release_in_xmit() work as bit lock in rds_send_xmit(), so they
      are expected to ensure acquire/release memory ordering semantics.
      However, test_and_set_bit/clear_bit() don't imply such semantics, on
      top of this, following smp_mb__after_atomic() does not guarantee release
      ordering (memory barrier actually should be placed before clear_bit()).
      
      Instead, we use clear_bit_unlock/test_and_set_bit_lock() here.
      
      Fixes: 0f4b1c7e ("rds: fix rds_send_xmit() serialization")
      Fixes: 1f9ecd7e
      
       ("RDS: Pass rds_conn_path to rds_send_xmit()")
      Signed-off-by: default avatarYewon Choi <woni9911@gmail.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Link: https://lore.kernel.org/r/ZfQUxnNTO9AJmzwc@libra05
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d792459a
    • Nikita Zhandarovich's avatar
      wireguard: receive: annotate data-race around receiving_counter.counter · fdf16de0
      Nikita Zhandarovich authored
      [ Upstream commit bba045dc ]
      
      Syzkaller with KCSAN identified a data-race issue when accessing
      keypair->receiving_counter.counter. Use READ_ONCE() and WRITE_ONCE()
      annotations to mark the data race as intentional.
      
          BUG: KCSAN: data-race in wg_packet_decrypt_worker / wg_packet_rx_poll
      
          write to 0xffff888107765888 of 8 bytes by interrupt on cpu 0:
           counter_validate drivers/net/wireguard/receive.c:321 [inline]
           wg_packet_rx_poll+0x3ac/0xf00 drivers/net/wireguard/receive.c:461
           __napi_poll+0x60/0x3b0 net/core/dev.c:6536
           napi_poll net/core/dev.c:6605 [inline]
           net_rx_action+0x32b/0x750 net/core/dev.c:6738
           __do_softirq+0xc4/0x279 kernel/softirq.c:553
           do_softirq+0x5e/0x90 kernel/softirq.c:454
           __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381
           __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
           _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210
           spin_unlock_bh include/linux/spinlock.h:396 [inline]
           ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
           wg_packet_decrypt_worker+0x6c5/0x700 drivers/net/wireguard/receive.c:499
           process_one_work kernel/workqueue.c:2633 [inline]
           ...
      
          read to 0xffff888107765888 of 8 bytes by task 3196 on cpu 1:
           decrypt_packet drivers/net/wireguard/receive.c:252 [inline]
           wg_packet_decrypt_worker+0x220/0x700 drivers/net/wireguard/receive.c:501
           process_one_work kernel/workqueue.c:2633 [inline]
           process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706
           worker_thread+0x525/0x730 kernel/workqueue.c:2787
           ...
      
      Fixes: a9e90d99
      
       ("wireguard: noise: separate receive counter from send counter")
      Reported-by: default avatar <syzbot+d1de830e4ecdaac83d89@syzkaller.appspotmail.com>
      Signed-off-by: default avatarNikita Zhandarovich <n.zhandarovich@fintech.ru>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fdf16de0
    • Eric Dumazet's avatar
      net: move dev->state into net_device_read_txrx group · 1cb84b88
      Eric Dumazet authored
      [ Upstream commit f6e0a498 ]
      
      dev->state can be read in rx and tx fast paths.
      
      netif_running() which needs dev->state is called from
      - enqueue_to_backlog() [RX path]
      - __dev_direct_xmit()  [TX path]
      
      Fixes: 43a71cd6
      
       ("net-device: reorganize net_device fast path variables")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Coco Li <lixiaoyan@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240314200845.3050179-1-edumazet@google.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1cb84b88
    • Xuan Zhuo's avatar
      virtio: packed: fix unmap leak for indirect desc table · 51bacd9d
      Xuan Zhuo authored
      [ Upstream commit d5c0ed17 ]
      
      When use_dma_api and premapped are true, then the do_unmap is false.
      
      Because the do_unmap is false, vring_unmap_extra_packed is not called by
      detach_buf_packed.
      
        if (unlikely(vq->do_unmap)) {
                      curr = id;
                      for (i = 0; i < state->num; i++) {
                              vring_unmap_extra_packed(vq,
                                                       &vq->packed.desc_extra[curr]);
                              curr = vq->packed.desc_extra[curr].next;
                      }
        }
      
      So the indirect desc table is not unmapped. This causes the unmap leak.
      
      So here, we check vq->use_dma_api instead. Synchronously, dma info is
      updated based on use_dma_api judgment
      
      This bug does not occur, because no driver use the premapped with
      indirect.
      
      Fixes: b319940f
      
       ("virtio_ring: skip unmap for premapped")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20240223071833.26095-1-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      51bacd9d
    • Jonah Palmer's avatar
      vdpa/mlx5: Allow CVQ size changes · 9df6b5a9
      Jonah Palmer authored
      [ Upstream commit 749a4016
      
       ]
      
      The MLX driver was not updating its control virtqueue size at set_vq_num
      and instead always initialized to MLX5_CVQ_MAX_ENT (16) at
      setup_cvq_vring.
      
      Qemu would try to set the size to 64 by default, however, because the
      CVQ size always was initialized to 16, an error would be thrown when
      sending >16 control messages (as used-ring entry 17 is initialized to 0).
      For example, starting a guest with x-svq=on and then executing the
      following command would produce the error below:
      
       # for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done
      
       qemu-system-x86_64: Insufficient written data (0)
       [  435.331223] virtio_net virtio0: Failed to set mac address by vq command.
       SIOCSIFHWADDR: Invalid argument
      
      Acked-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Acked-by: default avatarEugenio Pérez <eperezma@redhat.com>
      Signed-off-by: default avatarJonah Palmer <jonah.palmer@oracle.com>
      Message-Id: <20240216142502.78095-1-jonah.palmer@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarLei Yang <leiyang@redhat.com>
      Fixes: 5262912e
      
       ("vdpa/mlx5: Add support for control VQ and MAC setting")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9df6b5a9
    • Steve Sistare's avatar
      vdpa_sim: reset must not run · f5f6085a
      Steve Sistare authored
      [ Upstream commit 9588e7fc ]
      
      vdpasim_do_reset sets running to true, which is wrong, as it allows
      vdpasim_kick_vq to post work requests before the device has been
      configured.  To fix, do not set running until VIRTIO_CONFIG_S_DRIVER_OK
      is set.
      
      Fixes: 0c89e2a3
      
       ("vdpa_sim: Implement suspend vdpa op")
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Reviewed-by: default avatarEugenio Pérez <eperezma@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <1707517807-137331-1-git-send-email-steven.sistare@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f5f6085a
    • Suzuki K Poulose's avatar
      virtio: uapi: Drop __packed attribute in linux/virtio_pci.h · ac266cbf
      Suzuki K Poulose authored
      [ Upstream commit ec6ecb84 ]
      
      Commit 92792ac7 ("virtio-pci: Introduce admin command sending function")
      added "__packed" structures to UAPI header linux/virtio_pci.h. This triggers
      build failures in the consumer userspace applications without proper "definition"
      of __packed (e.g., kvmtool build fails).
      
      Moreover, the structures are already packed well, and doesn't need explicit
      packing, similar to the rest of the structures in all virtio_* headers. Remove
      the __packed attribute.
      
      Fixes: 92792ac7
      
       ("virtio-pci: Introduce admin command sending function")
      Cc: Feng Liu <feliu@nvidia.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Yishai Hadas <yishaih@nvidia.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Message-Id: <20240125232039.913606-1-suzuki.poulose@arm.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ac266cbf
    • Arthur Grillo's avatar
      drm: Fix drm_fixp2int_round() making it add 0.5 · be300fb4
      Arthur Grillo authored
      [ Upstream commit 807f96ab ]
      
      As well noted by Pekka[1], the rounding of drm_fixp2int_round is wrong.
      To round a number, you need to add 0.5 to the number and floor that,
      drm_fixp2int_round() is adding 0.0000076. Make it add 0.5.
      
      [1]: https://lore.kernel.org/all/20240301135327.22efe0dd.pekka.paalanen@collabora.com/
      
      Fixes: 8b253208
      
       ("drm: Add fixed-point helper to get rounded integer values")
      Suggested-by: default avatarPekka Paalanen <pekka.paalanen@collabora.com>
      Reviewed-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Reviewed-by: default avatarMelissa Wen <mwen@igalia.com>
      Signed-off-by: default avatarArthur Grillo <arthurgrillo@riseup.net>
      Signed-off-by: default avatarMelissa Wen <melissa.srw@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240316-drm_fixed-v2-1-c1bc2665b5ed@riseup.net
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      be300fb4
    • Adam Butcher's avatar
      spi: spi-imx: fix off-by-one in mx51 CPU mode burst length · 68fb3d6e
      Adam Butcher authored
      [ Upstream commit cf6d79a0 ]
      
      c712c05e ("spi: imx: fix the burst length at DMA mode and CPU mode")
      corrects three cases of setting the ECSPI burst length but erroneously
      leaves the in-range CPU case one bit to big (in that field a value of
      0 means 1 bit).  The effect was that transmissions that should have been
      8-bit bytes appeared as 9-bit causing failed communication with SPI
      devices.
      
      Link: https://lore.kernel.org/all/20240201105451.507005-1-carlos.song@nxp.com/
      Link: https://lore.kernel.org/all/20240204091912.36488-1-carlos.song@nxp.com/
      Fixes: c712c05e
      
       ("spi: imx: fix the burst length at DMA mode and CPU mode")
      Signed-off-by: default avatarAdam Butcher <adam@jessamine.co.uk>
      Link: https://msgid.link/r/20240318175119.3334-1-adam@jessamine.co.uk
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      68fb3d6e
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: prevent possible incorrect XTAL frequency selection · 0aa607a9
      Arınç ÜNAL authored
      [ Upstream commit f490c492 ]
      
      On MT7530, the HT_XTAL_FSEL field of the HWTRAP register stores a 2-bit
      value that represents the frequency of the crystal oscillator connected to
      the switch IC. The field is populated by the state of the ESW_P4_LED_0 and
      ESW_P4_LED_0 pins, which is done right after reset is deasserted.
      
        ESW_P4_LED_0    ESW_P3_LED_0    Frequency
        -----------------------------------------
        0               0               Reserved
        0               1               20MHz
        1               0               40MHz
        1               1               25MHz
      
      On MT7531, the XTAL25 bit of the STRAP register stores this. The LAN0LED0
      pin is used to populate the bit. 25MHz when the pin is high, 40MHz when
      it's low.
      
      These pins are also used with LEDs, therefore, their state can be set to
      something other than the bootstrapping configuration. For example, a link
      may be established on port 3 before the DSA subdriver takes control of the
      switch which would set ESW_P3_LED_0 to high.
      
      Currently on mt7530_setup() and mt7531_setup(), 1000 - 1100 usec delay is
      described between reset assertion and deassertion. Some switch ICs in real
      life conditions cannot always have these pins set back to the bootstrapping
      configuration before reset deassertion in this amount of delay. This causes
      wrong crystal frequency to be selected which puts the switch in a
      nonfunctional state after reset deassertion.
      
      The tests below are conducted on an MT7530 with a 40MHz crystal oscillator
      by Justin Swartz.
      
      With a cable from an active peer connected to port 3 before reset, an
      incorrect crystal frequency (0b11 = 25MHz) is selected:
      
                            [1]                  [3]     [5]
                            :                    :       :
                    _____________________________         __________________
      ESW_P4_LED_0                               |_______|
                    _____________________________
      ESW_P3_LED_0                               |__________________________
      
                             :                  : :     :
                             :                  : [4]...:
                             :                  :
                             [2]................:
      
      [1] Reset is asserted.
      [2] Period of 1000 - 1100 usec.
      [3] Reset is deasserted.
      [4] Period of 315 usec. HWTRAP register is populated with incorrect
          XTAL frequency.
      [5] Signals reflect the bootstrapped configuration.
      
      Increase the delay between reset_control_assert() and
      reset_control_deassert(), and gpiod_set_value_cansleep(priv->reset, 0) and
      gpiod_set_value_cansleep(priv->reset, 1) to 5000 - 5100 usec. This amount
      ensures a higher possibility that the switch IC will have these pins back
      to the bootstrapping configuration before reset deassertion.
      
      With a cable from an active peer connected to port 3 before reset, the
      correct crystal frequency (0b10 = 40MHz) is selected:
      
                            [1]        [2-1]     [3]     [5]
                            :          :         :       :
                    _____________________________         __________________
      ESW_P4_LED_0                               |_______|
                    ___________________           _______
      ESW_P3_LED_0                     |_________|       |__________________
      
                             :          :       : :     :
                             :          [2-2]...: [4]...:
                             [2]................:
      
      [1] Reset is asserted.
      [2] Period of 5000 - 5100 usec.
      [2-1] ESW_P3_LED_0 goes low.
      [2-2] Remaining period of 5000 - 5100 usec.
      [3] Reset is deasserted.
      [4] Period of 310 usec. HWTRAP register is populated with bootstrapped
          XTAL frequency.
      [5] Signals reflect the bootstrapped configuration.
      
      ESW_P3_LED_0 low period before reset deassertion:
      
                    5000 usec
                  - 5100 usec
          TEST     RESET HOLD
             #         (usec)
        ---------------------
             1           5410
             2           5440
             3           4375
             4           5490
             5           5475
             6           4335
             7           4370
             8           5435
             9           4205
            10           4335
            11           3750
            12           3170
            13           4395
            14           4375
            15           3515
            16           4335
            17           4220
            18           4175
            19           4175
            20           4350
      
           Min           3170
           Max           5490
      
        Median       4342.500
           Avg       4466.500
      
      Revert commit 2920dd92 ("net: dsa: mt7530: disable LEDs before reset").
      Changing the state of pins via reset assertion is simpler and more
      efficient than doing so by setting the LED controller off.
      
      Fixes: b8f126a8 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Fixes: c288575f
      
       ("net: dsa: mt7530: Add the support of MT7531 switch")
      Co-developed-by: default avatarJustin Swartz <justin.swartz@risingedge.co.za>
      Signed-off-by: default avatarJustin Swartz <justin.swartz@risingedge.co.za>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0aa607a9
    • Ignat Korchagin's avatar
      net: veth: do not manipulate GRO when using XDP · 0380da42
      Ignat Korchagin authored
      [ Upstream commit d7db7775 ]
      
      Commit d3256efd ("veth: allow enabling NAPI even without XDP") tried to fix
      the fact that GRO was not possible without XDP, because veth did not use NAPI
      without XDP. However, it also introduced the behaviour that GRO is always
      enabled, when XDP is enabled.
      
      While it might be desired for most cases, it is confusing for the user at best
      as the GRO flag suddenly changes, when an XDP program is attached. It also
      introduces some complexities in state management as was partially addressed in
      commit fe9f8013 ("net: veth: clear GRO when clearing XDP even when down").
      
      But the biggest problem is that it is not possible to disable GRO at all, when
      an XDP program is attached, which might be needed for some use cases.
      
      Fix this by not touching the GRO flag on XDP enable/disable as the code already
      supports switching to NAPI if either GRO or XDP is requested.
      
      Link: https://lore.kernel.org/lkml/20240311124015.38106-1-ignat@cloudflare.com/
      Fixes: d3256efd ("veth: allow enabling NAPI even without XDP")
      Fixes: fe9f8013
      
       ("net: veth: clear GRO when clearing XDP even when down")
      Signed-off-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0380da42
    • Leon Romanovsky's avatar
      xfrm: Allow UDP encapsulation only in offload modes · d147bb6a
      Leon Romanovsky authored
      [ Upstream commit 773bb766 ]
      
      The missing check of x->encap caused to the situation where GSO packets
      were created with UDP encapsulation.
      
      As a solution return the encap check for non-offloaded SA.
      
      Fixes: 983a73da
      
       ("xfrm: Pass UDP encapsulation in TX packet offload")
      Closes: https://lore.kernel.org/all/a650221ae500f0c7cf496c61c96c1b103dcb6f67.camel@redhat.com
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d147bb6a
    • Eric Dumazet's avatar
      packet: annotate data-races around ignore_outgoing · 8b1e273c
      Eric Dumazet authored
      [ Upstream commit 6ebfad33 ]
      
      ignore_outgoing is read locklessly from dev_queue_xmit_nit()
      and packet_getsockopt()
      
      Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
      
      syzbot reported:
      
      BUG: KCSAN: data-race in dev_queue_xmit_nit / packet_setsockopt
      
      write to 0xffff888107804542 of 1 bytes by task 22618 on cpu 0:
       packet_setsockopt+0xd83/0xfd0 net/packet/af_packet.c:4003
       do_sock_setsockopt net/socket.c:2311 [inline]
       __sys_setsockopt+0x1d8/0x250 net/socket.c:2334
       __do_sys_setsockopt net/socket.c:2343 [inline]
       __se_sys_setsockopt net/socket.c:2340 [inline]
       __x64_sys_setsockopt+0x66/0x80 net/socket.c:2340
       do_syscall_64+0xd3/0x1d0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      read to 0xffff888107804542 of 1 bytes by task 27 on cpu 1:
       dev_queue_xmit_nit+0x82/0x620 net/core/dev.c:2248
       xmit_one net/core/dev.c:3527 [inline]
       dev_hard_start_xmit+0xcc/0x3f0 net/core/dev.c:3547
       __dev_queue_xmit+0xf24/0x1dd0 net/core/dev.c:4335
       dev_queue_xmit include/linux/netdevice.h:3091 [inline]
       batadv_send_skb_packet+0x264/0x300 net/batman-adv/send.c:108
       batadv_send_broadcast_skb+0x24/0x30 net/batman-adv/send.c:127
       batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:392 [inline]
       batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:420 [inline]
       batadv_iv_send_outstanding_bat_ogm_packet+0x3f0/0x4b0 net/batman-adv/bat_iv_ogm.c:1700
       process_one_work kernel/workqueue.c:3254 [inline]
       process_scheduled_works+0x465/0x990 kernel/workqueue.c:3335
       worker_thread+0x526/0x730 kernel/workqueue.c:3416
       kthread+0x1d1/0x210 kernel/kthread.c:388
       ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 27 Comm: kworker/u8:1 Tainted: G        W          6.8.0-syzkaller-08073-g480e035fc4c7 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
      Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
      
      Fixes: fa788d98
      
       ("packet: add sockopt to ignore outgoing packets")
      Reported-by: default avatar <syzbot+c669c1136495a2e7c31f@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/netdev/CANn89i+Z7MfbkBLOv=p7KZ7=K1rKHO4P1OL5LYDCtBiyqsa9oQ@mail.gmail.com/T/#t
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8b1e273c
    • Juergen Gross's avatar
      xen/events: increment refcnt only if event channel is refcounted · 95af4cb3
      Juergen Gross authored
      [ Upstream commit d277f9d8 ]
      
      In bind_evtchn_to_irq_chip() don't increment the refcnt of the event
      channel blindly. In case the event channel is NOT refcounted, issue a
      warning instead.
      
      Add an additional safety net by doing the refcnt increment only if the
      caller has specified IRQF_SHARED in the irqflags parameter.
      
      Fixes: 9e90e58c
      
       ("xen: evtchn: Allow shared registration of IRQ handers")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarOleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Link: https://lore.kernel.org/r/20240313071409.25913-3-jgross@suse.com
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      95af4cb3
    • Juergen Gross's avatar
      xen/evtchn: avoid WARN() when unbinding an event channel · 9e2d4b58
      Juergen Gross authored
      [ Upstream commit 51c23bd6 ]
      
      When unbinding a user event channel, the related handler might be
      called a last time in case the kernel was built with
      CONFIG_DEBUG_SHIRQ. This might cause a WARN() in the handler.
      
      Avoid that by adding an "unbinding" flag to struct user_event which
      will short circuit the handler.
      
      Fixes: 9e90e58c
      
       ("xen: evtchn: Allow shared registration of IRQ handers")
      Reported-by: default avatarDemi Marie Obenour <demi@invisiblethingslab.com>
      Tested-by: default avatarDemi Marie Obenour <demi@invisiblethingslab.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarOleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Link: https://lore.kernel.org/r/20240313071409.25913-2-jgross@suse.com
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9e2d4b58
    • Alexandre Ghiti's avatar
      riscv: Fix compilation error with FAST_GUP and rv32 · 4b24e17e
      Alexandre Ghiti authored
      [ Upstream commit 2bb7e0c4 ]
      
      By surrounding the definition of pte_leaf_size() with a ifdef napot as
      it should have been.
      
      Fixes: e0fe5ab4
      
       ("riscv: Fix pte_leaf_size() for NAPOT")
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Link: https://lore.kernel.org/r/20240304080247.387710-1-alexghiti@rivosinc.com
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4b24e17e
    • Cristian Ciocaltea's avatar
      ASoC: SOF: amd: Skip IRAM/DRAM size modification for Steam Deck OLED · 41d8bb80
      Cristian Ciocaltea authored
      [ Upstream commit 094d1176 ]
      
      The recent introduction of the ACP/PSP communication for IRAM/DRAM fence
      register modification breaks the audio support on Valve's Steam Deck
      OLED device.
      
      It causes IPC timeout errors when trying to load DSP topology during
      probing:
      
      1707255557.688176 kernel: snd_sof_amd_vangogh 0000:04:00.5: ipc tx timed out for 0x30100000 (msg/reply size: 48/0)
      1707255557.689035 kernel: snd_sof_amd_vangogh 0000:04:00.5: ------------[ IPC dump start ]------------
      1707255557.689421 kernel: snd_sof_amd_vangogh 0000:04:00.5: dsp_msg = 0x0 dsp_ack = 0x91d14f6f host_msg = 0x1 host_ack = 0xead0f1a4 irq_stat >
      1707255557.689730 kernel: snd_sof_amd_vangogh 0000:04:00.5: ------------[ IPC dump end ]------------
      1707255557.690074 kernel: snd_sof_amd_vangogh 0000:04:00.5: ------------[ DSP dump start ]------------
      1707255557.690376 kernel: snd_sof_amd_vangogh 0000:04:00.5: IPC timeout
      1707255557.690744 kernel: snd_sof_amd_vangogh 0000:04:00.5: fw_state: SOF_FW_BOOT_COMPLETE (7)
      1707255557.691037 kernel: snd_sof_amd_vangogh 0000:04:00.5: invalid header size 0xdb43fe7. FW oops is bogus
      1707255557.694824 kernel: snd_sof_amd_vangogh 0000:04:00.5: unexpected fault 0x6942d3b3 trace 0x6942d3b3
      1707255557.695392 kernel: snd_sof_amd_vangogh 0000:04:00.5: ------------[ DSP dump end ]------------
      1707255557.695755 kernel: snd_sof_amd_vangogh 0000:04:00.5: Failed to setup widget PIPELINE.6.ACPHS1.IN
      1707255557.696069 kernel: snd_sof_amd_vangogh 0000:04:00.5: error: tplg component load failed -110
      1707255557.696374 kernel: snd_sof_amd_vangogh 0000:04:00.5: error: failed to load DSP topology -22
      1707255557.697904 kernel: snd_sof_amd_vangogh 0000:04:00.5: ASoC: error at snd_soc_component_probe on 0000:04:00.5: -22
      1707255557.698405 kernel: sof_mach nau8821-max: ASoC: failed to instantiate card -22
      1707255557.701061 kernel: sof_mach nau8821-max: error -EINVAL: Failed to register card(sof-nau8821-max)
      1707255557.701624 kernel: sof_mach: probe of nau8821-max failed with error -22
      
      Introduce a new member skip_iram_dram_size_mod to struct acp_quirk_entry and
      use it to skip IRAM/DRAM size modification for Vangogh Galileo device.
      
      Fixes: 55d7bbe4
      
       ("ASoC: SOF: amd: Add acp-psp mailbox interface for iram-dram fence register modification")
      Signed-off-by: default avatarCristian Ciocaltea <cristian.ciocaltea@collabora.com>
      Link: https://msgid.link/r/20240220201623.438944-3-cristian.ciocaltea@collabora.com
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      41d8bb80
    • Cristian Ciocaltea's avatar
      ASoC: SOF: amd: Move signed_fw_image to struct acp_quirk_entry · e99e95f0
      Cristian Ciocaltea authored
      [ Upstream commit 33c3d813
      
       ]
      
      The signed_fw_image member of struct sof_amd_acp_desc is used to enable
      signed firmware support in the driver via the acp_sof_quirk_table.
      
      In preparation to support additional use cases of the quirk table (i.e.
      adding new flags), move signed_fw_image to a new struct acp_quirk_entry
      and update all references to it accordingly.
      
      No functional changes intended.
      
      Signed-off-by: default avatarCristian Ciocaltea <cristian.ciocaltea@collabora.com>
      Link: https://msgid.link/r/20240220201623.438944-2-cristian.ciocaltea@collabora.com
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Stable-dep-of: 094d1176
      
       ("ASoC: SOF: amd: Skip IRAM/DRAM size modification for Steam Deck OLED")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e99e95f0
    • Pavel Begunkov's avatar
      io_uring: fix poll_remove stalled req completion · 6d1e3913
      Pavel Begunkov authored
      [ Upstream commit 5e3afe58 ]
      
      Taking the ctx lock is not enough to use the deferred request completion
      infrastructure, it'll get queued into the list but no one would expect
      it there, so it will sit there until next io_submit_flush_completions().
      It's hard to care about the cancellation path, so complete it via tw.
      
      Fixes: ef7dfac5
      
       ("io_uring/poll: serialize poll linked timer start with poll removal")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/c446740bc16858f8a2a8dcdce899812f21d15f23.1710514702.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d1e3913
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix PPE hanging issue · 09a19074
      Daniel Golle authored
      [ Upstream commit ea80e3ed ]
      
      A patch to resolve an issue was found in MediaTek's GPL-licensed SDK:
      In the mtk_ppe_stop() function, the PPE scan mode is not disabled before
      disabling the PPE. This can potentially lead to a hang during the process
      of disabling the PPE.
      
      Without this patch, the PPE may experience a hang during the reboot test.
      
      Link: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/b40da332dfe763932a82f9f62a4709457a15dd6c
      Fixes: ba37b7ca
      
       ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
      Suggested-by: default avatarBc-bocun Chen <bc-bocun.chen@mediatek.com>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      09a19074
    • Daniel Golle's avatar
      net: mediatek: mtk_eth_soc: clear MAC_MCR_FORCE_LINK only when MAC is up · 1a16fb6a
      Daniel Golle authored
      [ Upstream commit f1b85ef1 ]
      
      Clearing bit MAC_MCR_FORCE_LINK which forces the link down too early
      can result in MAC ending up in a broken/blocked state.
      
      Fix this by handling this bit in the .mac_link_up and .mac_link_down
      calls instead of in .mac_finish.
      
      Fixes: b8fc9f30
      
       ("net: ethernet: mediatek: Add basic PHYLINK support")
      Suggested-by: default avatarMason-cw Chang <Mason-cw.Chang@mediatek.com>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1a16fb6a
    • José Roberto de Souza's avatar
      drm/xe: Skip VMAs pin when requesting signal to the last XE_EXEC · 43c8a525
      José Roberto de Souza authored
      [ Upstream commit dd8a07f0 ]
      
      Doing a XE_EXEC with num_batch_buffer == 0 makes signals passed as
      argument to be signaled when the last real XE_EXEC is completed.
      But to do that it was first pinning all VMAs in drm_gpuvm_exec_lock(),
      this patch remove this pinning as it is not required.
      
      This change also help Mesa implementing memory over-commiting recovery
      as it needs to unbind not needed VMAs when the whole VM can't fit
      in GPU memory but it can only do the unbiding when the last XE_EXEC
      is completed.
      So with this change Mesa can get the signal it want without getting
      out-of-memory errors.
      
      Fixes: eb9702ad
      
       ("drm/xe: Allow num_batch_buffer / num_binds == 0 in IOCTLs")
      Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
      Co-developed-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: default avatarJosé Roberto de Souza <jose.souza@intel.com>
      Reviewed-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240313171318.121066-1-jose.souza@intel.com
      (cherry picked from commit 58480c1c
      
      )
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      43c8a525
    • Matthew Brost's avatar
      drm/xe: Invalidate userptr VMA on page pin fault · 567d34a7
      Matthew Brost authored
      [ Upstream commit 38602139 ]
      
      Rather than return an error to the user or ban the VM when userptr VMA
      page pin fails with -EFAULT, invalidate VMA mappings. This supports the
      UMD use case of freeing userptr while still having bindings.
      
      Now that non-faulting VMs can invalidate VMAs, drop the usm prefix for
      the tile_invalidated member.
      
      v2:
       - Fix build error (CI)
      v3:
       - Don't invalidate VMA if in fault mode, rather kill VM (Thomas)
       - Update commit message with tile_invalidated name chagne (Thomas)
       - Wait VM bookkeep slots with VM resv lock (Thomas)
      v4:
       - Move list_del_init(&userptr.repin_link) after error check (Thomas)
       - Assert not in fault mode (Matthew)
      
      Fixes: dd08ebf6
      
       ("drm/xe: Introduce a new DRM driver for Intel GPUs")
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240312183907.933835-1-matthew.brost@intel.com
      (cherry picked from commit 521db22a
      
      )
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      567d34a7
    • Chunguang Xu's avatar
      nvme: fix reconnection fail due to reserved tag allocation · 262da920
      Chunguang Xu authored
      [ Upstream commit de105068 ]
      
      We found a issue on production environment while using NVMe over RDMA,
      admin_q reconnect failed forever while remote target and network is ok.
      After dig into it, we found it may caused by a ABBA deadlock due to tag
      allocation. In my case, the tag was hold by a keep alive request
      waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the
      request maked as idle and will not process before reset success. As
      fabric_q shares tagset with admin_q, while reconnect remote target, we
      need a tag for connect command, but the only one reserved tag was held
      by keep alive command which waiting inside admin_q. As a result, we
      failed to reconnect admin_q forever. In order to fix this issue, I
      think we should keep two reserved tags for admin queue.
      
      Fixes: ed01fee2
      
       ("nvme-fabrics: only reserve a single tag")
      Signed-off-by: default avatarChunguang Xu <chunguang.xu@shopee.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      262da920
    • Florian Fainelli's avatar
      spi: Fix error code checking in spi_mem_exec_op() · a1097308
      Florian Fainelli authored
      [ Upstream commit 29895ce1 ]
      
      After commit cff49d58 ("spi: Unify error codes by replacing -ENOTSUPP with
      -EOPNOTSUPP"), our SPI NOR flashes would stop probing with the following
      visible in the kernel log:
      
      [    2.196300] brcmstb_qspi f0440920.qspi: using bspi-mspi mode
      [    2.210295] spi-nor: probe of spi1.0 failed with error -95
      
      It turns out that the check in spi_mem_exec_op() was changed to check
      for -ENOTSUPP (old error code) or -EOPNOTSUPP (new error code), but this
      means that for drivers that were converted, the second condition is now
      true, and we stop falling through like we used to. Fix the error to
      check for neither error being neither -ENOTSUPP *nor* -EOPNOTSUPP.
      
      Fixes: cff49d58
      
       ("spi: Unify error codes by replacing -ENOTSUPP with -EOPNOTSUPP")
      Reviewed-by: default avatarMichael Walle <mwalle@kernel.org>
      Reviewed-by: default avatarPratyush Yadav <pratyush@kernel.org>
      Signed-off-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Reviewed-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarTudor Ambarus <tudor.ambarus@linaro.org>
      Link: https://msgid.link/r/20240313194530.3150446-1-florian.fainelli@broadcom.com
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a1097308
    • Théo Lebrun's avatar
      spi: spi-mem: add statistics support to ->exec_op() calls · f9f24405
      Théo Lebrun authored
      [ Upstream commit e63aef9c
      
       ]
      
      Current behavior is that spi-mem operations do not increment statistics,
      neither per-controller nor per-device, if ->exec_op() is used. For
      operations that do NOT use ->exec_op(), stats are increased as the
      usual spi_sync() is called.
      
      The newly implemented spi_mem_add_op_stats() function is strongly
      inspired by spi_statistics_add_transfer_stats(); locking logic and
      l2len computation comes from there.
      
      Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers,
      errors, timedout, transfer_bytes_histo_*.
      
      Note about messages & transfers counters: in the fallback to spi_sync()
      case, there are from 1 to 4 transfers per message. We only register one
      big transfer in the ->exec_op() case as that is closer to reality.
      
      This patch is NOT touching:
       - spi_async, spi_sync, spi_sync_immediate: those counters describe
         precise function calls, incrementing them would be lying. I believe
         comparing the messages counter to spi_async+spi_sync is a good way
         to detect ->exec_op() calls, but I might be missing edge cases
         knowledge.
       - transfers_split_maxsize: splitting cannot happen if ->exec_op() is
         provided.
      
      Reviewed-by: default avatarDhruva Gole <d-gole@ti.com>
      Signed-off-by: default avatarThéo Lebrun <theo.lebrun@bootlin.com>
      Reviewed-by: default avatarTudor Ambarus <tudor.ambarus@linaro.org>
      Link: https://msgid.link/r/20240216-spi-mem-stats-v2-1-9256dfe4887d@bootlin.com
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Stable-dep-of: 29895ce1
      
       ("spi: Fix error code checking in spi_mem_exec_op()")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f9f24405
    • Duanqiang Wen's avatar
      net: txgbe: fix clk_name exceed MAX_DEV_ID limits · 97eb67bd
      Duanqiang Wen authored
      [ Upstream commit e30cef00 ]
      
      txgbe register clk which name is i2c_designware.pci_dev_id(),
      clk_name will be stored in clk_lookup_alloc. If PCIe bus number
      is larger than 0x39, clk_name size will be larger than 20 bytes.
      It exceeds clk_lookup_alloc MAX_DEV_ID limits. So the driver
      shortened clk_name.
      
      Fixes: b63f2048
      
       ("net: txgbe: Register fixed rate clock")
      Signed-off-by: default avatarDuanqiang Wen <duanqiangwen@net-swift.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Link: https://lore.kernel.org/r/20240313080634.459523-1-duanqiangwen@net-swift.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      97eb67bd
    • Shigeru Yoshida's avatar
      hsr: Fix uninit-value access in hsr_get_node() · 09e5cdbe
      Shigeru Yoshida authored
      [ Upstream commit ddbec99f ]
      
      KMSAN reported the following uninit-value access issue [1]:
      
      =====================================================
      BUG: KMSAN: uninit-value in hsr_get_node+0xa2e/0xa40 net/hsr/hsr_framereg.c:246
       hsr_get_node+0xa2e/0xa40 net/hsr/hsr_framereg.c:246
       fill_frame_info net/hsr/hsr_forward.c:577 [inline]
       hsr_forward_skb+0xe12/0x30e0 net/hsr/hsr_forward.c:615
       hsr_dev_xmit+0x1a1/0x270 net/hsr/hsr_device.c:223
       __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
       netdev_start_xmit include/linux/netdevice.h:4954 [inline]
       xmit_one net/core/dev.c:3548 [inline]
       dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564
       __dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349
       dev_queue_xmit include/linux/netdevice.h:3134 [inline]
       packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
       packet_snd net/packet/af_packet.c:3087 [inline]
       packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       __sys_sendto+0x735/0xa10 net/socket.c:2191
       __do_sys_sendto net/socket.c:2203 [inline]
       __se_sys_sendto net/socket.c:2199 [inline]
       __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x6d/0x140 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was created at:
       slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768
       slab_alloc_node mm/slub.c:3478 [inline]
       kmem_cache_alloc_node+0x5e9/0xb10 mm/slub.c:3523
       kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
       __alloc_skb+0x318/0x740 net/core/skbuff.c:651
       alloc_skb include/linux/skbuff.h:1286 [inline]
       alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6334
       sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2787
       packet_alloc_skb net/packet/af_packet.c:2936 [inline]
       packet_snd net/packet/af_packet.c:3030 [inline]
       packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       __sys_sendto+0x735/0xa10 net/socket.c:2191
       __do_sys_sendto net/socket.c:2203 [inline]
       __se_sys_sendto net/socket.c:2199 [inline]
       __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x6d/0x140 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      CPU: 1 PID: 5033 Comm: syz-executor334 Not tainted 6.7.0-syzkaller-00562-g9f8413c4a66f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      =====================================================
      
      If the packet type ID field in the Ethernet header is either ETH_P_PRP or
      ETH_P_HSR, but it is not followed by an HSR tag, hsr_get_skb_sequence_nr()
      reads an invalid value as a sequence number. This causes the above issue.
      
      This patch fixes the issue by returning NULL if the Ethernet header is not
      followed by an HSR tag.
      
      Fixes: f266a683
      
       ("net/hsr: Better frame dispatch")
      Reported-and-tested-by: default avatar <syzbot+2ef3a8ce8e91b5a50098@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=2ef3a8ce8e91b5a50098 [1]
      Signed-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Link: https://lore.kernel.org/r/20240312152719.724530-1-syoshida@redhat.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      09e5cdbe
    • William Tu's avatar
      vmxnet3: Fix missing reserved tailroom · 91d017d1
      William Tu authored
      [ Upstream commit e127ce76
      
       ]
      
      Use rbi->len instead of rcd->len for non-dataring packet.
      
      Found issue:
        XDP_WARN: xdp_update_frame_from_buff(line:278): Driver BUG: missing reserved tailroom
        WARNING: CPU: 0 PID: 0 at net/core/xdp.c:586 xdp_warn+0xf/0x20
        CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  O       6.5.1 #1
        RIP: 0010:xdp_warn+0xf/0x20
        ...
        ? xdp_warn+0xf/0x20
        xdp_do_redirect+0x15f/0x1c0
        vmxnet3_run_xdp+0x17a/0x400 [vmxnet3]
        vmxnet3_process_xdp+0xe4/0x760 [vmxnet3]
        ? vmxnet3_tq_tx_complete.isra.0+0x21e/0x2c0 [vmxnet3]
        vmxnet3_rq_rx_complete+0x7ad/0x1120 [vmxnet3]
        vmxnet3_poll_rx_only+0x2d/0xa0 [vmxnet3]
        __napi_poll+0x20/0x180
        net_rx_action+0x177/0x390
      
      Reported-by: default avatarMartin Zaharinov <micron10@gmail.com>
      Tested-by: default avatarMartin Zaharinov <micron10@gmail.com>
      Link: https://lore.kernel.org/netdev/74BF3CC8-2A3A-44FF-98C2-1E20F110A92E@gmail.com/
      Fixes: 54f00cce
      
       ("vmxnet3: Add XDP support.")
      Signed-off-by: default avatarWilliam Tu <witu@nvidia.com>
      Link: https://lore.kernel.org/r/20240309183147.28222-1-witu@nvidia.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      91d017d1
    • Kuniyuki Iwashima's avatar
      tcp: Fix refcnt handling in __inet_hash_connect(). · ad105cde
      Kuniyuki Iwashima authored
      [ Upstream commit 04d9d1fc ]
      
      syzbot reported a warning in sk_nulls_del_node_init_rcu().
      
      The commit 66b60b0c
      
       ("dccp/tcp: Unhash sk from ehash for tb2 alloc
      failure after check_estalblished().") tried to fix an issue that an
      unconnected socket occupies an ehash entry when bhash2 allocation fails.
      
      In such a case, we need to revert changes done by check_established(),
      which does not hold refcnt when inserting socket into ehash.
      
      So, to revert the change, we need to __sk_nulls_add_node_rcu() instead
      of sk_nulls_add_node_rcu().
      
      Otherwise, sock_put() will cause refcnt underflow and leak the socket.
      
      [0]:
      WARNING: CPU: 0 PID: 23948 at include/net/sock.h:799 sk_nulls_del_node_init_rcu+0x166/0x1a0 include/net/sock.h:799
      Modules linked in:
      CPU: 0 PID: 23948 Comm: syz-executor.2 Not tainted 6.8.0-rc6-syzkaller-00159-gc055fc00c07b #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      RIP: 0010:sk_nulls_del_node_init_rcu+0x166/0x1a0 include/net/sock.h:799
      Code: e8 7f 71 c6 f7 83 fb 02 7c 25 e8 35 6d c6 f7 4d 85 f6 0f 95 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 1b 6d c6 f7 90 <0f> 0b 90 eb b2 e8 10 6d c6 f7 4c 89 e7 be 04 00 00 00 e8 63 e7 d2
      RSP: 0018:ffffc900032d7848 EFLAGS: 00010246
      RAX: ffffffff89cd0035 RBX: 0000000000000001 RCX: 0000000000040000
      RDX: ffffc90004de1000 RSI: 000000000003ffff RDI: 0000000000040000
      RBP: 1ffff1100439ac26 R08: ffffffff89ccffe3 R09: 1ffff1100439ac28
      R10: dffffc0000000000 R11: ffffed100439ac29 R12: ffff888021cd6140
      R13: dffffc0000000000 R14: ffff88802a9bf5c0 R15: ffff888021cd6130
      FS:  00007f3b823f16c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3b823f0ff8 CR3: 000000004674a000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __inet_hash_connect+0x140f/0x20b0 net/ipv4/inet_hashtables.c:1139
       dccp_v6_connect+0xcb9/0x1480 net/dccp/ipv6.c:956
       __inet_stream_connect+0x262/0xf30 net/ipv4/af_inet.c:678
       inet_stream_connect+0x65/0xa0 net/ipv4/af_inet.c:749
       __sys_connect_file net/socket.c:2048 [inline]
       __sys_connect+0x2df/0x310 net/socket.c:2065
       __do_sys_connect net/socket.c:2075 [inline]
       __se_sys_connect net/socket.c:2072 [inline]
       __x64_sys_connect+0x7a/0x90 net/socket.c:2072
       do_syscall_64+0xf9/0x240
       entry_SYSCALL_64_after_hwframe+0x6f/0x77
      RIP: 0033:0x7f3b8167dda9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f3b823f10c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 00007f3b817abf80 RCX: 00007f3b8167dda9
      RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003
      RBP: 00007f3b823f1120 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
      R13: 000000000000000b R14: 00007f3b817abf80 R15: 00007ffd3beb57b8
       </TASK>
      
      Reported-by: default avatar <syzbot+12c506c1aae251e70449@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=12c506c1aae251e70449
      Fixes: 66b60b0c
      
       ("dccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished().")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240308201623.65448-1-kuniyu@amazon.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ad105cde
    • Gabriel Krisman Bertazi's avatar
      io_uring: Fix release of pinned pages when __io_uaddr_map fails · 4d376d7a
      Gabriel Krisman Bertazi authored
      [ Upstream commit 67d1189d
      
       ]
      
      Looking at the error path of __io_uaddr_map, if we fail after pinning
      the pages for any reasons, ret will be set to -EINVAL and the error
      handler won't properly release the pinned pages.
      
      I didn't manage to trigger it without forcing a failure, but it can
      happen in real life when memory is heavily fragmented.
      
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@suse.de>
      Fixes: 223ef474
      
       ("io_uring: don't allow IORING_SETUP_NO_MMAP rings on highmem pages")
      Link: https://lore.kernel.org/r/20240313213912.1920-1-krisman@suse.de
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4d376d7a
    • Sibi Sankar's avatar
      cpufreq: Fix per-policy boost behavior on SoCs using cpufreq_boost_set_sw() · 552f799d
      Sibi Sankar authored
      [ Upstream commit f37a4d6b ]
      
      In the existing code, per-policy flags don't have any impact i.e.
      if cpufreq_driver boost is enabled and boost is disabled for one or
      more of the policies, the cpufreq driver will behave as if boost is
      enabled.
      
      Fix this by incorporating per-policy boost flag in the policy->max
      computation used in cpufreq_frequency_table_cpuinfo and setting the
      default per-policy boost to mirror the cpufreq_driver boost flag.
      
      Fixes: 218a06a7
      
       ("cpufreq: Support per-policy performance boost")
      Reported-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarDhruva Gole <d-gole@ti.com>
      Signed-off-by: default avatarSibi Sankar <quic_sibis@quicinc.com>
      Tested-by: default avatarYipeng Zou &lt;zouyipeng@huawei.com&gt; <mailto:zouyipeng@huawei.com>
      Reviewed-by: default avatarYipeng Zou &lt;zouyipeng@huawei.com&gt; <mailto:zouyipeng@huawei.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      552f799d
    • Arnd Bergmann's avatar
      soc: fsl: dpio: fix kcalloc() argument order · bfe239bc
      Arnd Bergmann authored
      [ Upstream commit 72ebb41b ]
      
      A previous bugfix added a call to kcalloc(), which starting in gcc-14
      causes a harmless warning about the argument order:
      
      drivers/soc/fsl/dpio/dpio-service.c: In function 'dpaa2_io_service_enqueue_multiple_desc_fq':
      drivers/soc/fsl/dpio/dpio-service.c:526:29: error: 'kcalloc' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
        526 |         ed = kcalloc(sizeof(struct qbman_eq_desc), 32, GFP_KERNEL);
            |                             ^~~~~~
      drivers/soc/fsl/dpio/dpio-service.c:526:29: note: earlier argument should specify number of elements, later size of each element
      
      Since the two are only multiplied, the order does not change the
      behavior, so just fix it now to shut up the compiler warning.
      
      Dmity independently came up with the same fix.
      
      Fixes: 5c4a5999
      
       ("soc: fsl: dpio: avoid stack usage warning")
      Reported-by: default avatarDmitry Antipov <dmantipov@yandex.ru>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bfe239bc
    • Charlie Jenkins's avatar
      riscv: Only check online cpus for emulated accesses · 52805f2f
      Charlie Jenkins authored
      [ Upstream commit 313130c6
      
       ]
      
      The unaligned access checker only sets valid values for online cpus.
      Check for these values on online cpus rather than on present cpus.
      
      Signed-off-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Fixes: 71c54b3d
      
       ("riscv: report misaligned accesses emulation to hwprobe")
      Tested-by: default avatarSamuel Holland <samuel.holland@sifive.com>
      Link: https://lore.kernel.org/r/20240308-disable_misaligned_probe_config-v9-2-a388770ba0ce@rivosinc.com
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      52805f2f
    • Shay Drory's avatar
      devlink: Fix devlink parallel commands processing · d394d076
      Shay Drory authored
      [ Upstream commit d7d75124 ]
      
      Commit 870c7ad4 ("devlink: protect devlink->dev by the instance
      lock") added devlink instance locking inside a loop that iterates over
      all the registered devlink instances on the machine in the pre-doit
      phase. This can lead to serialization of devlink commands over
      different devlink instances.
      
      For example: While the first devlink instance is executing firmware
      flash, all commands to other devlink instances on the machine are
      forced to wait until the first devlink finishes.
      
      Therefore, in the pre-doit phase, take the devlink instance lock only
      for the devlink instance the command is targeting. Devlink layer is
      taking a reference on the devlink instance, ensuring the devlink->dev
      pointer is valid. This reference taking was introduced by commit
      a3806872 ("devlink: take device reference for devlink object").
      Without this commit, it would not be safe to access devlink->dev
      lockless.
      
      Fixes: 870c7ad4
      
       ("devlink: protect devlink->dev by the instance lock")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d394d076
    • Eric Dumazet's avatar
      net/sched: taprio: proper TCA_TAPRIO_TC_ENTRY_INDEX check · 9b720bb1
      Eric Dumazet authored
      [ Upstream commit 343041b5 ]
      
      taprio_parse_tc_entry() is not correctly checking
      TCA_TAPRIO_TC_ENTRY_INDEX attribute:
      
      	int tc; // Signed value
      
      	tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]);
      	if (tc >= TC_QOPT_MAX_QUEUE) {
      		NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range");
      		return -ERANGE;
      	}
      
      syzbot reported that it could fed arbitary negative values:
      
      UBSAN: shift-out-of-bounds in net/sched/sch_taprio.c:1722:18
      shift exponent -2147418108 is negative
      CPU: 0 PID: 5066 Comm: syz-executor367 Not tainted 6.8.0-rc7-syzkaller-00136-gc8a5c731fd12 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
        ubsan_epilogue lib/ubsan.c:217 [inline]
        __ubsan_handle_shift_out_of_bounds+0x3c7/0x420 lib/ubsan.c:386
        taprio_parse_tc_entry net/sched/sch_taprio.c:1722 [inline]
        taprio_parse_tc_entries net/sched/sch_taprio.c:1768 [inline]
        taprio_change+0xb87/0x57d0 net/sched/sch_taprio.c:1877
        taprio_init+0x9da/0xc80 net/sched/sch_taprio.c:2134
        qdisc_create+0x9d4/0x1190 net/sched/sch_api.c:1355
        tc_modify_qdisc+0xa26/0x1e40 net/sched/sch_api.c:1776
        rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6617
        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
        netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
        netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg+0x221/0x270 net/socket.c:745
        ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
        ___sys_sendmsg net/socket.c:2638 [inline]
        __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
       do_syscall_64+0xf9/0x240
       entry_SYSCALL_64_after_hwframe+0x6f/0x77
      RIP: 0033:0x7f1b2dea3759
      Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffd4de452f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f1b2def0390 RCX: 00007f1b2dea3759
      RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000004
      RBP: 0000000000000003 R08: 0000555500000000 R09: 0000555500000000
      R10: 0000555500000000 R11: 0000000000000246 R12: 00007ffd4de45340
      R13: 00007ffd4de45310 R14: 0000000000000001 R15: 00007ffd4de45340
      
      Fixes: a54fc09e
      
       ("net/sched: taprio: allow user input of per-tc max SDU")
      Reported-and-tested-by: default avatar <syzbot+a340daa06412d6028918@syzkaller.appspotmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9b720bb1
    • Mete Durlu's avatar
      s390/vtime: fix average steal time calculation · 2d06ffd9
      Mete Durlu authored
      [ Upstream commit 367c50f7 ]
      
      Current average steal timer calculation produces volatile and inflated
      values. The only user of this value is KVM so far and it uses that to
      decide whether or not to yield the vCPU which is seeing steal time.
      KVM compares average steal timer to a threshold and if the threshold
      is past then it does not allow CPU polling and yields it to host, else
      it keeps the CPU by polling.
      Since KVM's steal time threshold is very low by default (%10) it most
      likely is not effected much by the bloated average steal timer values
      because the operating region is pretty small. However there might be
      new users in the future who might rely on this number. Fix average
      steal timer calculation by changing the formula from:
      
      	avg_steal_timer = avg_steal_timer / 2 + steal_timer;
      
      to the following:
      
      	avg_steal_timer = (avg_steal_timer + steal_timer) / 2;
      
      This ensures that avg_steal_timer is actually a naive average of steal
      timer values. It now closely follows steal timer values but of course
      in a smoother manner.
      
      Fixes: 152e9b86
      
       ("s390/vtime: steal time exponential moving average")
      Signed-off-by: default avatarMete Durlu <meted@linux.ibm.com>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2d06ffd9
    • Linu Cherian's avatar
      octeontx2-af: Use matching wake_up API variant in CGX command interface · a3031f17
      Linu Cherian authored
      [ Upstream commit e642921d ]
      
      Use wake_up API instead of wake_up_interruptible, since
      wait_event_timeout API is used for waiting on command completion.
      
      Fixes: 1463f382
      
       ("octeontx2-af: Add support for CGX link management")
      Signed-off-by: default avatarLinu Cherian <lcherian@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a3031f17
    • Kuniyuki Iwashima's avatar
      rds: tcp: Fix use-after-free of net in reqsk_timer_handler(). · 1e9fd5cf
      Kuniyuki Iwashima authored
      [ Upstream commit 2a750d6a ]
      
      syzkaller reported a warning of netns tracker [0] followed by KASAN
      splat [1] and another ref tracker warning [1].
      
      syzkaller could not find a repro, but in the log, the only suspicious
      sequence was as follows:
      
        18:26:22 executing program 1:
        r0 = socket$inet6_mptcp(0xa, 0x1, 0x106)
        ...
        connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async)
      
      The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT.
      
      So, the scenario would be:
      
        1. unshare(CLONE_NEWNET) creates a per netns tcp listener in
            rds_tcp_listen_init().
        2. syz-executor connect()s to it and creates a reqsk.
        3. syz-executor exit()s immediately.
        4. netns is dismantled.  [0]
        5. reqsk timer is fired, and UAF happens while freeing reqsk.  [1]
        6. listener is freed after RCU grace period.  [2]
      
      Basically, reqsk assumes that the listener guarantees netns safety
      until all reqsk timers are expired by holding the listener's refcount.
      However, this was not the case for kernel sockets.
      
      Commit 740ea3c4
      
       ("tcp: Clean up kernel listener's reqsk in
      inet_twsk_purge()") fixed this issue only for per-netns ehash.
      
      Let's apply the same fix for the global ehash.
      
      [0]:
      ref_tracker: net notrefcnt@0000000065449cc3 has 1/1 users at
           sk_alloc (./include/net/net_namespace.h:337 net/core/sock.c:2146)
           inet6_create (net/ipv6/af_inet6.c:192 net/ipv6/af_inet6.c:119)
           __sock_create (net/socket.c:1572)
           rds_tcp_listen_init (net/rds/tcp_listen.c:279)
           rds_tcp_init_net (net/rds/tcp.c:577)
           ops_init (net/core/net_namespace.c:137)
           setup_net (net/core/net_namespace.c:340)
           copy_net_ns (net/core/net_namespace.c:497)
           create_new_namespaces (kernel/nsproxy.c:110)
           unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
           ksys_unshare (kernel/fork.c:3429)
           __x64_sys_unshare (kernel/fork.c:3496)
           do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
           entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
      ...
      WARNING: CPU: 0 PID: 27 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
      
      [1]:
      BUG: KASAN: slab-use-after-free in inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966)
      Read of size 8 at addr ffff88801b370400 by task swapper/0/0
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <IRQ>
       dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
       print_report (mm/kasan/report.c:378 mm/kasan/report.c:488)
       kasan_report (mm/kasan/report.c:603)
       inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966)
       reqsk_timer_handler (net/ipv4/inet_connection_sock.c:979 net/ipv4/inet_connection_sock.c:1092)
       call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701)
       __run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2038)
       run_timer_softirq (kernel/time/timer.c:2053)
       __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
       irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
       sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14))
       </IRQ>
      
      Allocated by task 258 on cpu 0 at 83.612050s:
       kasan_save_stack (mm/kasan/common.c:48)
       kasan_save_track (mm/kasan/common.c:68)
       __kasan_slab_alloc (mm/kasan/common.c:343)
       kmem_cache_alloc (mm/slub.c:3813 mm/slub.c:3860 mm/slub.c:3867)
       copy_net_ns (./include/linux/slab.h:701 net/core/net_namespace.c:421 net/core/net_namespace.c:480)
       create_new_namespaces (kernel/nsproxy.c:110)
       unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
       ksys_unshare (kernel/fork.c:3429)
       __x64_sys_unshare (kernel/fork.c:3496)
       do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
      
      Freed by task 27 on cpu 0 at 329.158864s:
       kasan_save_stack (mm/kasan/common.c:48)
       kasan_save_track (mm/kasan/common.c:68)
       kasan_save_free_info (mm/kasan/generic.c:643)
       __kasan_slab_free (mm/kasan/common.c:265)
       kmem_cache_free (mm/slub.c:4299 mm/slub.c:4363)
       cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:639)
       process_one_work (kernel/workqueue.c:2638)
       worker_thread (kernel/workqueue.c:2700 kernel/workqueue.c:2787)
       kthread (kernel/kthread.c:388)
       ret_from_fork (arch/x86/kernel/process.c:153)
       ret_from_fork_asm (arch/x86/entry/entry_64.S:250)
      
      The buggy address belongs to the object at ffff88801b370000
       which belongs to the cache net_namespace of size 4352
      The buggy address is located 1024 bytes inside of
       freed 4352-byte region [ffff88801b370000, ffff88801b371100)
      
      [2]:
      WARNING: CPU: 0 PID: 95 at lib/ref_tracker.c:228 ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1))
      Modules linked in:
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1))
      ...
      Call Trace:
      <IRQ>
       __sk_destruct (./include/net/net_namespace.h:353 net/core/sock.c:2204)
       rcu_core (./arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2165 kernel/rcu/tree.c:2433)
       __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
       irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
       sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14))
      </IRQ>
      
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 467fa153
      
       ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240308200122.64357-3-kuniyu@amazon.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1e9fd5cf