Skip to content
  1. Aug 11, 2023
    • Aleksa Sarai's avatar
      open: make RESOLVE_CACHED correctly test for O_TMPFILE · ef0d07c6
      Aleksa Sarai authored
      commit a0fc452a upstream.
      
      O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
      fast-path check for RESOLVE_CACHED would reject all users passing
      O_DIRECTORY with -EAGAIN, when in fact the intended test was to check
      for __O_TMPFILE.
      
      Cc: stable@vger.kernel.org # v5.12+
      Fixes: 99668f61
      
       ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
      Signed-off-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef0d07c6
    • Jiri Olsa's avatar
      bpf: Disable preemption in bpf_event_output · c81bdf8f
      Jiri Olsa authored
      commit d62cc390
      
       upstream.
      
      We received report [1] of kernel crash, which is caused by
      using nesting protection without disabled preemption.
      
      The bpf_event_output can be called by programs executed by
      bpf_prog_run_array_cg function that disabled migration but
      keeps preemption enabled.
      
      This can cause task to be preempted by another one inside the
      nesting protection and lead eventually to two tasks using same
      perf_sample_data buffer and cause crashes like:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000001
        #PF: supervisor instruction fetch in kernel mode
        #PF: error_code(0x0010) - not-present page
        ...
        ? perf_output_sample+0x12a/0x9a0
        ? finish_task_switch.isra.0+0x81/0x280
        ? perf_event_output+0x66/0xa0
        ? bpf_event_output+0x13a/0x190
        ? bpf_event_output_data+0x22/0x40
        ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
        ? xa_load+0x87/0xe0
        ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
        ? release_sock+0x3e/0x90
        ? sk_setsockopt+0x1a1/0x12f0
        ? udp_pre_connect+0x36/0x50
        ? inet_dgram_connect+0x93/0xa0
        ? __sys_connect+0xb4/0xe0
        ? udp_setsockopt+0x27/0x40
        ? __pfx_udp_push_pending_frames+0x10/0x10
        ? __sys_setsockopt+0xdf/0x1a0
        ? __x64_sys_connect+0xf/0x20
        ? do_syscall_64+0x3a/0x90
        ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      Fixing this by disabling preemption in bpf_event_output.
      
      [1] https://github.com/cilium/cilium/issues/26756
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOleg "livelace" Popov <o.popov@livelace.ru>
      Closes: https://github.com/cilium/cilium/issues/26756
      Fixes: 2a916f2f
      
       ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.")
      Acked-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.org
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c81bdf8f
    • Ilya Dryomov's avatar
      rbd: prevent busy loop when requesting exclusive lock · ae07cfe2
      Ilya Dryomov authored
      commit 9d01e07f upstream.
      
      Due to rbd_try_acquire_lock() effectively swallowing all but
      EBLOCKLISTED error from rbd_try_lock() ("request lock anyway") and
      rbd_request_lock() returning ETIMEDOUT error not only for an actual
      notify timeout but also when the lock owner doesn't respond, a busy
      loop inside of rbd_acquire_lock() between rbd_try_acquire_lock() and
      rbd_request_lock() is possible.
      
      Requesting the lock on EBUSY error (returned by get_lock_owner_info()
      if an incompatible lock or invalid lock owner is detected) makes very
      little sense.  The same goes for ETIMEDOUT error (might pop up pretty
      much anywhere if osd_request_timeout option is set) and many others.
      
      Just fail I/O requests on rbd_dev->acquiring_list immediately on any
      error from rbd_try_lock().
      
      Cc: stable@vger.kernel.org # 58815900
      
      : rbd: retrieve and check lock owner twice before blocklisting
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae07cfe2
    • Paul Fertser's avatar
      wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC) · 7978bcca
      Paul Fertser authored
      commit 421033de
      
       upstream.
      
      On DBDC devices the first (internal) phy is only capable of using
      2.4 GHz band, and the 5 GHz band is exposed via a separate phy object,
      so avoid the false advertising.
      
      Reported-by: default avatarRani Hod <rani.hod@gmail.com>
      Closes: https://github.com/openwrt/openwrt/pull/12361
      Fixes: 7660a1bd
      
       ("mt76: mt7615: register ext_phy if DBDC is detected")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaul Fertser <fercerpav@gmail.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20230605073408.8699-1-fercerpav@gmail.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7978bcca
    • Laszlo Ersek's avatar
      net: tap_open(): set sk_uid from current_fsuid() · 32ca6a55
      Laszlo Ersek authored
      commit 5c9241f3 upstream.
      
      Commit 66b2c338 initializes the "sk_uid" field in the protocol socket
      (struct sock) from the "/dev/tapX" device node's owner UID. Per original
      commit 86741ec2 ("net: core: Add a UID field to struct sock.",
      2016-11-04), that's wrong: the idea is to cache the UID of the userspace
      process that creates the socket. Commit 86741ec2 mentions socket() and
      accept(); with "tap", the action that creates the socket is
      open("/dev/tapX").
      
      Therefore the device node's owner UID is irrelevant. In most cases,
      "/dev/tapX" will be owned by root, so in practice, commit 66b2c338 has
      no observable effect:
      
      - before, "sk_uid" would be zero, due to undefined behavior
        (CVE-2023-1076),
      
      - after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.
      
      What matters is the (fs)UID of the process performing the open(), so cache
      that in "sk_uid".
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pietro Borrello <borrello@diag.uniroma1.it>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 66b2c338
      
       ("tap: tap_open(): correctly initialize socket uid")
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
      Signed-off-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32ca6a55
    • Laszlo Ersek's avatar
      net: tun_chr_open(): set sk_uid from current_fsuid() · 4ed3eed9
      Laszlo Ersek authored
      commit 9bc30473 upstream.
      
      Commit a096ccca initializes the "sk_uid" field in the protocol socket
      (struct sock) from the "/dev/net/tun" device node's owner UID. Per
      original commit 86741ec2 ("net: core: Add a UID field to struct
      sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the
      userspace process that creates the socket. Commit 86741ec2 mentions
      socket() and accept(); with "tun", the action that creates the socket is
      open("/dev/net/tun").
      
      Therefore the device node's owner UID is irrelevant. In most cases,
      "/dev/net/tun" will be owned by root, so in practice, commit a096ccca
      has no observable effect:
      
      - before, "sk_uid" would be zero, due to undefined behavior
        (CVE-2023-1076),
      
      - after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.
      
      What matters is the (fs)UID of the process performing the open(), so cache
      that in "sk_uid".
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pietro Borrello <borrello@diag.uniroma1.it>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: a096ccca
      
       ("tun: tun_chr_open(): correctly initialize socket uid")
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
      Signed-off-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ed3eed9
    • Dinh Nguyen's avatar
      arm64: dts: stratix10: fix incorrect I2C property for SCL signal · adacc3a9
      Dinh Nguyen authored
      commit db66795f upstream.
      
      The correct dts property for the SCL falling time is
      "i2c-scl-falling-time-ns".
      
      Fixes: c8da1d15
      
       ("arm64: dts: stratix10: i2c clock running out of spec")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adacc3a9
    • Arseniy Krasnov's avatar
      mtd: rawnand: meson: fix OOB available bytes for ECC · b92c8800
      Arseniy Krasnov authored
      commit 7e6b04f9 upstream.
      
      It is incorrect to calculate number of OOB bytes for ECC engine using
      some "already known" ECC step size (1024 bytes here). Number of such
      bytes for ECC engine must be whole OOB except 2 bytes for bad block
      marker, while proper ECC step size and strength will be selected by
      ECC logic.
      
      Fixes: 8fae856c
      
       ("mtd: rawnand: meson: add support for Amlogic NAND flash controller")
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/linux-mtd/20230705065211.293500-1-AVKrasnov@sberdevices.ru
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b92c8800
    • Olivier Maignial's avatar
      mtd: spinand: toshiba: Fix ecc_get_status · b0875c58
      Olivier Maignial authored
      commit 8544cda9 upstream.
      
      Reading ECC status is failing.
      
      tx58cxgxsxraix_ecc_get_status() is using on-stack buffer
      for SPINAND_GET_FEATURE_OP() output. It is not suitable
      for DMA needs of spi-mem.
      
      Fix this by using the spi-mem operations dedicated buffer
      spinand->scratchbuf.
      
      See
      spinand->scratchbuf:
      https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mtd/spinand.h?h=v6.3#n418
      spi_mem_check_op():
      https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/spi/spi-mem.c?h=v6.3#n199
      
      Fixes: 10949af1
      
       ("mtd: spinand: Add initial support for Toshiba TC58CVG2S0H")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarOlivier Maignial <olivier.maignial@hotmail.fr>
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/linux-mtd/DB4P250MB1032553D05FBE36DEE0D311EFE23A@DB4P250MB1032.EURP250.PROD.OUTLOOK.COM
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0875c58
    • Sungjong Seo's avatar
      exfat: release s_lock before calling dir_emit() · 1c33ca1e
      Sungjong Seo authored
      commit ff84772f upstream.
      
      There is a potential deadlock reported by syzbot as below:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.4.0-next-20230707-syzkaller #0 Not tainted
      ------------------------------------------------------
      syz-executor330/5073 is trying to acquire lock:
      ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock_killable include/linux/mmap_lock.h:151 [inline]
      ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: get_mmap_lock_carefully mm/memory.c:5293 [inline]
      ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: lock_mm_and_find_vma+0x369/0x510 mm/memory.c:5344
      but task is already holding lock:
      ffff888019f760e0 (&sbi->s_lock){+.+.}-{3:3}, at: exfat_iterate+0x117/0xb50 fs/exfat/dir.c:232
      
      which lock already depends on the new lock.
      
      Chain exists of:
        &mm->mmap_lock --> mapping.invalidate_lock#3 --> &sbi->s_lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&sbi->s_lock);
                                     lock(mapping.invalidate_lock#3);
                                     lock(&sbi->s_lock);
        rlock(&mm->mmap_lock);
      
      Let's try to avoid above potential deadlock condition by moving dir_emit*()
      out of sbi->s_lock coverage.
      
      Fixes: ca061973
      
       ("exfat: add directory operations")
      Cc: stable@vger.kernel.org #v5.7+
      Reported-by: default avatar <syzbot+1741a5d9b79989c10bdc@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/lkml/00000000000078ee7e060066270b@google.com/T/#u
      Tested-by: default avatar <syzbot+1741a5d9b79989c10bdc@syzkaller.appspotmail.com>
      Signed-off-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c33ca1e
    • gaoming's avatar
      exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree · 8a34a242
      gaoming authored
      commit daf60d6c
      
       upstream.
      
      The call stack shown below is a scenario in the Linux 4.19 kernel.
      Allocating memory failed where exfat fs use kmalloc_array due to
      system memory fragmentation, while the u-disk was inserted without
      recognition.
      Devices such as u-disk using the exfat file system are pluggable and
      may be insert into the system at any time.
      However, long-term running systems cannot guarantee the continuity of
      physical memory. Therefore, it's necessary to address this issue.
      
      Binder:2632_6: page allocation failure: order:4,
       mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
      Call trace:
      [242178.097582]  dump_backtrace+0x0/0x4
      [242178.097589]  dump_stack+0xf4/0x134
      [242178.097598]  warn_alloc+0xd8/0x144
      [242178.097603]  __alloc_pages_nodemask+0x1364/0x1384
      [242178.097608]  kmalloc_order+0x2c/0x510
      [242178.097612]  kmalloc_order_trace+0x40/0x16c
      [242178.097618]  __kmalloc+0x360/0x408
      [242178.097624]  load_alloc_bitmap+0x160/0x284
      [242178.097628]  exfat_fill_super+0xa3c/0xe7c
      [242178.097635]  mount_bdev+0x2e8/0x3a0
      [242178.097638]  exfat_fs_mount+0x40/0x50
      [242178.097643]  mount_fs+0x138/0x2e8
      [242178.097649]  vfs_kern_mount+0x90/0x270
      [242178.097655]  do_mount+0x798/0x173c
      [242178.097659]  ksys_mount+0x114/0x1ac
      [242178.097665]  __arm64_sys_mount+0x24/0x34
      [242178.097671]  el0_svc_common+0xb8/0x1b8
      [242178.097676]  el0_svc_handler+0x74/0x90
      [242178.097681]  el0_svc+0x8/0x340
      
      By analyzing the exfat code,we found that continuous physical memory
      is not required here,so kvmalloc_array is used can solve this problem.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatargaoming <gaoming20@hihonor.com>
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a34a242
    • Borislav Petkov (AMD)'s avatar
      x86/CPU/AMD: Do not leak quotient data after a division by 0 · a7487820
      Borislav Petkov (AMD) authored
      commit 77245f1c
      
       upstream.
      
      Under certain circumstances, an integer division by 0 which faults, can
      leave stale quotient data from a previous division operation on Zen1
      microarchitectures.
      
      Do a dummy division 0/1 before returning from the #DE exception handler
      in order to avoid any leaks of potentially sensitive data.
      
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7487820
    • Krzysztof Kozlowski's avatar
      firmware: arm_scmi: Drop OF node reference in the transport channel setup · b8f029fc
      Krzysztof Kozlowski authored
      commit da042eb4 upstream.
      
      The OF node reference obtained from of_parse_phandle() should be dropped
      if node is not compatible with arm,scmi-shmem.
      
      Fixes: 507cd4d2
      
       ("firmware: arm_scmi: Add compatibility checks for shmem node")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Reviewed-by: default avatarCristian Marussi <cristian.marussi@arm.com>
      Link: https://lore.kernel.org/r/20230719061652.8850-1-krzysztof.kozlowski@linaro.org
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8f029fc
    • Xiubo Li's avatar
      ceph: defer stopping mdsc delayed_work · 287c2c86
      Xiubo Li authored
      commit e7e607bd
      
       upstream.
      
      Flushing the dirty buffer may take a long time if the cluster is
      overloaded or if there is network issue. So we should ping the
      MDSs periodically to keep alive, else the MDS will blocklist
      the kclient.
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/61843
      Signed-off-by: default avatarXiubo Li <xiubli@redhat.com>
      Reviewed-by: default avatarMilind Changire <mchangir@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      287c2c86
    • Ross Maynard's avatar
      USB: zaurus: Add ID for A-300/B-500/C-700 · 98b521d1
      Ross Maynard authored
      commit b99225b4 upstream.
      
      The SL-A300, B500/5600, and C700 devices no longer auto-load because of
      "usbnet: Remove over-broad module alias from zaurus."
      This patch adds IDs for those 3 devices.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217632
      Fixes: 16adf5d0
      
       ("usbnet: Remove over-broad module alias from zaurus.")
      Signed-off-by: default avatarRoss Maynard <bids.7405@bigpond.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/69b5423b-2013-9fc9-9569-58e707d9bafb@bigpond.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98b521d1
    • Ilya Dryomov's avatar
      libceph: fix potential hang in ceph_osdc_notify() · cd6872f2
      Ilya Dryomov authored
      commit e6e28432
      
       upstream.
      
      If the cluster becomes unavailable, ceph_osdc_notify() may hang even
      with osd_request_timeout option set because linger_notify_finish_wait()
      waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD
      request in flight -- it's completely asynchronous.
      
      Introduce an additional timeout, derived from the specified notify
      timeout.  While at it, switch both waits to killable which is more
      correct.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      Reviewed-by: default avatarXiubo Li <xiubli@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd6872f2
    • Michael Kelley's avatar
      scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices · e5f5b4a8
      Michael Kelley authored
      commit 010c1e1c upstream.
      
      The Hyper-V host is queried to get the max transfer size that it supports,
      and this value is used to set max_sectors for the synthetic SCSI
      controller.  However, this max transfer size may be too large for virtual
      Fibre Channel devices, which are limited to 512 Kbytes.  If a larger
      transfer size is used with a vFC device, Hyper-V always returns an error,
      and storvsc logs a message like this where the SRB status and SCSI status
      are both zero:
      
      hv_storvsc <GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001
      
      Add logic to limit the max transfer size to 512 Kbytes for vFC devices.
      
      Fixes: 1d3e0980
      
       ("scsi: storvsc: Correct reporting of Hyper-V I/O size limits")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Link: https://lore.kernel.org/r/1689887102-32806-1-git-send-email-mikelley@microsoft.com
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5f5b4a8
    • Steffen Maier's avatar
      scsi: zfcp: Defer fc_rport blocking until after ADISC response · 212a9a3c
      Steffen Maier authored
      commit e6585198 upstream.
      
      Storage devices are free to send RSCNs, e.g. for internal state changes. If
      this happens on all connected paths, zfcp risks temporarily losing all
      paths at the same time. This has strong requirements on multipath
      configuration such as "no_path_retry queue".
      
      Avoid such situations by deferring fc_rport blocking until after the ADISC
      response, when any actual state change of the remote port became clear.
      The already existing port recovery triggers explicitly block the fc_rport.
      The triggers are: on ADISC reject or timeout (typical cable pull case), and
      on ADISC indicating that the remote port has changed its WWPN or
      the port is meanwhile no longer open.
      
      As a side effect, this also removes a confusing direct function call to
      another work item function zfcp_scsi_rport_work() instead of scheduling
      that other work item. It was probably done that way to have the rport block
      side effect immediate and synchronous to the caller.
      
      Fixes: a2fa0aed
      
       ("[SCSI] zfcp: Block FC transport rports early on errors")
      Cc: stable@vger.kernel.org #v2.6.30+
      Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Reviewed-by: default avatarFedor Loshakov <loshakov@linux.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230724145156.3920244-1-maier@linux.ibm.com
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      212a9a3c
    • Eric Dumazet's avatar
      tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen · dac38272
      Eric Dumazet authored
      [ Upstream commit ddf251fa ]
      
      Whenever tcpm_new() reclaims an old entry, tcpm_suck_dst()
      would overwrite data that could be read from tcp_fastopen_cache_get()
      or tcp_metrics_fill_info().
      
      We need to acquire fastopen_seqlock to maintain consistency.
      
      For newly allocated objects, tcpm_new() can switch to kzalloc()
      to avoid an extra fastopen_seqlock acquisition.
      
      Fixes: 1fe4c481
      
       ("net-tcp: Fast Open client - cookie cache")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230802131500.1478140-7-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dac38272
    • Eric Dumazet's avatar
      tcp_metrics: annotate data-races around tm->tcpm_net · 4517782e
      Eric Dumazet authored
      [ Upstream commit d5d986ce ]
      
      tm->tcpm_net can be read or written locklessly.
      
      Instead of changing write_pnet() and read_pnet() and potentially
      hurt performance, add the needed READ_ONCE()/WRITE_ONCE()
      in tm_net() and tcpm_new().
      
      Fixes: 849e8a0c
      
       ("tcp_metrics: Add a field tcpm_net and verify it matches on lookup")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230802131500.1478140-6-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4517782e
    • Eric Dumazet's avatar
      tcp_metrics: annotate data-races around tm->tcpm_vals[] · e842a686
      Eric Dumazet authored
      [ Upstream commit 8c4d04f6 ]
      
      tm->tcpm_vals[] values can be read or written locklessly.
      
      Add needed READ_ONCE()/WRITE_ONCE() to document this,
      and force use of tcp_metric_get() and tcp_metric_set()
      
      Fixes: 51c5d0c4
      
       ("tcp: Maintain dynamic metrics in local cache.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e842a686
    • Eric Dumazet's avatar
      tcp_metrics: annotate data-races around tm->tcpm_lock · d3184bea
      Eric Dumazet authored
      [ Upstream commit 285ce119 ]
      
      tm->tcpm_lock can be read or written locklessly.
      
      Add needed READ_ONCE()/WRITE_ONCE() to document this.
      
      Fixes: 51c5d0c4
      
       ("tcp: Maintain dynamic metrics in local cache.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230802131500.1478140-4-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d3184bea
    • Eric Dumazet's avatar
      tcp_metrics: annotate data-races around tm->tcpm_stamp · 9a7367cb
      Eric Dumazet authored
      [ Upstream commit 949ad62a ]
      
      tm->tcpm_stamp can be read or written locklessly.
      
      Add needed READ_ONCE()/WRITE_ONCE() to document this.
      
      Also constify tcpm_check_stamp() dst argument.
      
      Fixes: 51c5d0c4
      
       ("tcp: Maintain dynamic metrics in local cache.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230802131500.1478140-3-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9a7367cb
    • Eric Dumazet's avatar
      tcp_metrics: fix addr_same() helper · 6f6bd67f
      Eric Dumazet authored
      [ Upstream commit e6638094 ]
      
      Because v4 and v6 families use separate inetpeer trees (respectively
      net->ipv4.peers and net->ipv6.peers), inetpeer_addr_cmp(a, b) assumes
      a & b share the same family.
      
      tcp_metrics use a common hash table, where entries can have different
      families.
      
      We must therefore make sure to not call inetpeer_addr_cmp()
      if the families do not match.
      
      Fixes: d39d14ff
      
       ("net: Add helper function to compare inetpeer addresses")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230802131500.1478140-2-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f6bd67f
    • Jonas Gorski's avatar
      prestera: fix fallback to previous version on same major version · b0acbcf1
      Jonas Gorski authored
      [ Upstream commit b755c25f ]
      
      When both supported and previous version have the same major version,
      and the firmwares are missing, the driver ends in a loop requesting the
      same (previous) version over and over again:
      
          [   76.327413] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
          [   76.339802] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
          [   76.352162] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
          [   76.364502] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
          [   76.376848] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
          [   76.389183] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
          [   76.401522] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
          [   76.413860] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
          [   76.426199] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
          ...
      
      Fix this by inverting the check to that we aren't yet at the previous
      version, and also check the minor version.
      
      This also catches the case where both versions are the same, as it was
      after commit bb5dbf2c ("net: marvell: prestera: add firmware v4.0
      support").
      
      With this fix applied:
      
          [   88.499622] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
          [   88.511995] Prestera DX 0000:01:00.0: failed to request previous firmware: mrvl/prestera/mvsw_prestera_fw-v4.0.img
          [   88.522403] Prestera DX: probe of 0000:01:00.0 failed with error -2
      
      Fixes: 47f26018
      
       ("net: marvell: prestera: try to load previous fw version")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@bisdn.de>
      Acked-by: default avatarElad Nachman <enachman@marvell.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Acked-by: default avatarTaras Chornyi <taras.chornyi@plvision.eu>
      Link: https://lore.kernel.org/r/20230802092357.163944-1-jonas.gorski@bisdn.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0acbcf1
    • Jianbo Liu's avatar
      net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio · d6d9d0f5
      Jianbo Liu authored
      [ Upstream commit c635ca45 ]
      
      In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added
      to support multiple parallel namespaces for multi-chains. And we skip
      all the flow tables under the fs_node of this type unconditionally,
      when searching for the next or previous flow table to connect for a
      new table.
      
      As this search function is also used for find new root table when the
      old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS
      fs_node next to the old root. However, new root table should be chosen
      from it if there is any table in it. Fix it by skipping only the flow
      tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the
      closest FT for a fs_node.
      
      Besides, complete the connecting from FTs of previous priority of prio
      because there should be multiple prevs after this fs_prio type is
      introduced. And also the next FT should be chosen from the first flow
      table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if
      this prio is the first child.
      
      Fixes: 328edb49
      
       ("net/mlx5: Split FDB fast path prio to multiple namespaces")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.1690803944.git.leonro@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d6d9d0f5
    • Jianbo Liu's avatar
      net/mlx5: fs_core: Make find_closest_ft more generic · c999fb10
      Jianbo Liu authored
      [ Upstream commit 618d28a5
      
       ]
      
      As find_closest_ft_recursive is called to find the closest FT, the
      first parameter of find_closest_ft can be changed from fs_prio to
      fs_node. Thus this function is extended to find the closest FT for the
      nodes of any type, not only prios, but also the sub namespaces.
      
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/d3962c2b443ec8dde7a740dc742a1f052d5e256c.1690803944.git.leonro@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Stable-dep-of: c635ca45
      
       ("net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c999fb10
    • Benjamin Poirier's avatar
      vxlan: Fix nexthop hash size · 32ef2c0c
      Benjamin Poirier authored
      [ Upstream commit 0756384f ]
      
      The nexthop code expects a 31 bit hash, such as what is returned by
      fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash
      returned by skb_get_hash() can lead to problems related to the fact that
      'int hash' is a negative number when the MSB is set.
      
      In the case of hash threshold nexthop groups, nexthop_select_path_hthr()
      will disproportionately select the first nexthop group entry. In the case
      of resilient nexthop groups, nexthop_select_path_res() may do an out of
      bounds access in nh_buckets[], for example:
          hash = -912054133
          num_nh_buckets = 2
          bucket_index = 65535
      
      which leads to the following panic:
      
      BUG: unable to handle page fault for address: ffffc900025910c8
      PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
      CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
      Workqueue: ipv6_addrconf addrconf_dad_work
      RIP: 0010:nexthop_select_path+0x197/0xbf0
      Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
      RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
      RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
      RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
      R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
      R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
      FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? __die+0x23/0x70
       ? page_fault_oops+0x1ee/0x5c0
       ? __pfx_is_prefetch.constprop.0+0x10/0x10
       ? __pfx_page_fault_oops+0x10/0x10
       ? search_bpf_extables+0xfe/0x1c0
       ? fixup_exception+0x3b/0x470
       ? exc_page_fault+0xf6/0x110
       ? asm_exc_page_fault+0x26/0x30
       ? nexthop_select_path+0x197/0xbf0
       ? nexthop_select_path+0x197/0xbf0
       ? lock_is_held_type+0xe7/0x140
       vxlan_xmit+0x5b2/0x2340
       ? __lock_acquire+0x92b/0x3370
       ? __pfx_vxlan_xmit+0x10/0x10
       ? __pfx___lock_acquire+0x10/0x10
       ? __pfx_register_lock_class+0x10/0x10
       ? skb_network_protocol+0xce/0x2d0
       ? dev_hard_start_xmit+0xca/0x350
       ? __pfx_vxlan_xmit+0x10/0x10
       dev_hard_start_xmit+0xca/0x350
       __dev_queue_xmit+0x513/0x1e20
       ? __pfx___dev_queue_xmit+0x10/0x10
       ? __pfx_lock_release+0x10/0x10
       ? mark_held_locks+0x44/0x90
       ? skb_push+0x4c/0x80
       ? eth_header+0x81/0xe0
       ? __pfx_eth_header+0x10/0x10
       ? neigh_resolve_output+0x215/0x310
       ? ip6_finish_output2+0x2ba/0xc90
       ip6_finish_output2+0x2ba/0xc90
       ? lock_release+0x236/0x3e0
       ? ip6_mtu+0xbb/0x240
       ? __pfx_ip6_finish_output2+0x10/0x10
       ? find_held_lock+0x83/0xa0
       ? lock_is_held_type+0xe7/0x140
       ip6_finish_output+0x1ee/0x780
       ip6_output+0x138/0x460
       ? __pfx_ip6_output+0x10/0x10
       ? __pfx___lock_acquire+0x10/0x10
       ? __pfx_ip6_finish_output+0x10/0x10
       NF_HOOK.constprop.0+0xc0/0x420
       ? __pfx_NF_HOOK.constprop.0+0x10/0x10
       ? ndisc_send_skb+0x2c0/0x960
       ? __pfx_lock_release+0x10/0x10
       ? __local_bh_enable_ip+0x93/0x110
       ? lock_is_held_type+0xe7/0x140
       ndisc_send_skb+0x4be/0x960
       ? __pfx_ndisc_send_skb+0x10/0x10
       ? mark_held_locks+0x65/0x90
       ? find_held_lock+0x83/0xa0
       ndisc_send_ns+0xb0/0x110
       ? __pfx_ndisc_send_ns+0x10/0x10
       addrconf_dad_work+0x631/0x8e0
       ? lock_acquire+0x180/0x3f0
       ? __pfx_addrconf_dad_work+0x10/0x10
       ? mark_held_locks+0x24/0x90
       process_one_work+0x582/0x9c0
       ? __pfx_process_one_work+0x10/0x10
       ? __pfx_do_raw_spin_lock+0x10/0x10
       ? mark_held_locks+0x24/0x90
       worker_thread+0x93/0x630
       ? __kthread_parkme+0xdc/0x100
       ? __pfx_worker_thread+0x10/0x10
       kthread+0x1a5/0x1e0
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x34/0x60
       ? __pfx_kthread+0x10/0x10
       ret_from_fork_asm+0x1b/0x30
      RIP: 0000:0x0
      Code: Unable to access opcode bytes at 0xffffffffffffffd6.
      RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      Modules linked in:
      CR2: ffffc900025910c8
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:nexthop_select_path+0x197/0xbf0
      Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
      RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
      RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
      RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
      R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
      R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
      FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0
      PKRU: 55555554
      Kernel panic - not syncing: Fatal exception in interrupt
      Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fix this problem by ensuring the MSB of hash is 0 using a right shift - the
      same approach used in fib_multipath_hash() and rt6_multipath_hash().
      
      Fixes: 1274e1cc
      
       ("vxlan: ecmp support for mac fdb entries")
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      32ef2c0c
    • Yue Haibing's avatar
      ip6mr: Fix skb_under_panic in ip6mr_cache_report() · 1bb54a21
      Yue Haibing authored
      [ Upstream commit 30e0191b ]
      
      skbuff: skb_under_panic: text:ffffffff88771f69 len:56 put:-4
       head:ffff88805f86a800 data:ffff887f5f86a850 tail:0x88 end:0x2c0 dev:pim6reg
       ------------[ cut here ]------------
       kernel BUG at net/core/skbuff.c:192!
       invalid opcode: 0000 [#1] PREEMPT SMP KASAN
       CPU: 2 PID: 22968 Comm: kworker/2:11 Not tainted 6.5.0-rc3-00044-g0a8db05b571a #236
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
       Workqueue: ipv6_addrconf addrconf_dad_work
       RIP: 0010:skb_panic+0x152/0x1d0
       Call Trace:
        <TASK>
        skb_push+0xc4/0xe0
        ip6mr_cache_report+0xd69/0x19b0
        reg_vif_xmit+0x406/0x690
        dev_hard_start_xmit+0x17e/0x6e0
        __dev_queue_xmit+0x2d6a/0x3d20
        vlan_dev_hard_start_xmit+0x3ab/0x5c0
        dev_hard_start_xmit+0x17e/0x6e0
        __dev_queue_xmit+0x2d6a/0x3d20
        neigh_connected_output+0x3ed/0x570
        ip6_finish_output2+0x5b5/0x1950
        ip6_finish_output+0x693/0x11c0
        ip6_output+0x24b/0x880
        NF_HOOK.constprop.0+0xfd/0x530
        ndisc_send_skb+0x9db/0x1400
        ndisc_send_rs+0x12a/0x6c0
        addrconf_dad_completed+0x3c9/0xea0
        addrconf_dad_work+0x849/0x1420
        process_one_work+0xa22/0x16e0
        worker_thread+0x679/0x10c0
        ret_from_fork+0x28/0x60
        ret_from_fork_asm+0x11/0x20
      
      When setup a vlan device on dev pim6reg, DAD ns packet may sent on reg_vif_xmit().
      reg_vif_xmit()
          ip6mr_cache_report()
              skb_push(skb, -skb_network_offset(pkt));//skb_network_offset(pkt) is 4
      And skb_push declared as:
      	void *skb_push(struct sk_buff *skb, unsigned int len);
      		skb->data -= len;
      		//0xffff88805f86a84c - 0xfffffffc = 0xffff887f5f86a850
      skb->data is set to 0xffff887f5f86a850, which is invalid mem addr, lead to skb_push() fails.
      
      Fixes: 14fb64e1
      
       ("[IPV6] MROUTE: Support PIM-SM (SSM).")
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1bb54a21
    • Alexandra Winter's avatar
      s390/qeth: Don't call dev_close/dev_open (DOWN/UP) · 64e3affe
      Alexandra Winter authored
      [ Upstream commit 1cfef80d ]
      
      dev_close() and dev_open() are issued to change the interface state to DOWN
      or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its
      Ipv6 addresses and routes. We don't want this in cases of device recovery
      (triggered by hardware or software) or when the qeth device is set
      offline.
      
      Setting a qeth device offline or online and device recovery actions call
      netif_device_detach() and/or netif_device_attach(). That will reset or
      set the LOWER_UP indication i.e. change the dev->state Bit
      __LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and
      still preserves the interface settings that are handled by the network
      stack.
      
      Don't call dev_open() nor dev_close() from the qeth device driver. Let the
      network stack handle this.
      
      Fixes: d4560150
      
       ("s390/qeth: call dev_close() during recovery")
      Signed-off-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      64e3affe
    • Lin Ma's avatar
      net: dcb: choose correct policy to parse DCB_ATTR_BCN · a0da2684
      Lin Ma authored
      [ Upstream commit 31d49ba0 ]
      
      The dcbnl_bcn_setcfg uses erroneous policy to parse tb[DCB_ATTR_BCN],
      which is introduced in commit 859ee3c4 ("DCB: Add support for DCB
      BCN"). Please see the comment in below code
      
      static int dcbnl_bcn_setcfg(...)
      {
        ...
        ret = nla_parse_nested_deprecated(..., dcbnl_pfc_up_nest, .. )
        // !!! dcbnl_pfc_up_nest for attributes
        //  DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_ALL in enum dcbnl_pfc_up_attrs
        ...
        for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) {
        // !!! DCB_BCN_ATTR_RP_0 to DCB_BCN_ATTR_RP_7 in enum dcbnl_bcn_attrs
          ...
          value_byte = nla_get_u8(data[i]);
          ...
        }
        ...
        for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) {
        // !!! DCB_BCN_ATTR_BCNA_0 to DCB_BCN_ATTR_RI in enum dcbnl_bcn_attrs
        ...
          value_int = nla_get_u32(data[i]);
        ...
        }
        ...
      }
      
      That is, the nla_parse_nested_deprecated uses dcbnl_pfc_up_nest
      attributes to parse nlattr defined in dcbnl_pfc_up_attrs. But the
      following access code fetch each nlattr as dcbnl_bcn_attrs attributes.
      By looking up the associated nla_policy for dcbnl_bcn_attrs. We can find
      the beginning part of these two policies are "same".
      
      static const struct nla_policy dcbnl_pfc_up_nest[...] = {
              [DCB_PFC_UP_ATTR_0]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_1]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_2]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_3]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_4]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_5]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_6]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_7]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_ALL] = {.type = NLA_FLAG},
      };
      
      static const struct nla_policy dcbnl_bcn_nest[...] = {
              [DCB_BCN_ATTR_RP_0]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_1]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_2]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_3]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_4]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_5]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_6]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_7]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_ALL]       = {.type = NLA_FLAG},
              // from here is somewhat different
              [DCB_BCN_ATTR_BCNA_0]       = {.type = NLA_U32},
              ...
              [DCB_BCN_ATTR_ALL]          = {.type = NLA_FLAG},
      };
      
      Therefore, the current code is buggy and this
      nla_parse_nested_deprecated could overflow the dcbnl_pfc_up_nest and use
      the adjacent nla_policy to parse attributes from DCB_BCN_ATTR_BCNA_0.
      
      Hence use the correct policy dcbnl_bcn_nest to parse the nested
      tb[DCB_ATTR_BCN] TLV.
      
      Fixes: 859ee3c4
      
       ("DCB: Add support for DCB BCN")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230801013248.87240-1-linma@zju.edu.cn
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a0da2684
    • Mark Brown's avatar
      net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode · 19333322
      Mark Brown authored
      [ Upstream commit f3bb7759 ]
      
      As documented in acd7aaf5 ("netsec: ignore 'phy-mode' device
      property on ACPI systems") the SocioNext SynQuacer platform ships with
      firmware defining the PHY mode as RGMII even though the physical
      configuration of the PHY is for TX and RX delays.  Since bbc4d71d
      ("net: phy: realtek: fix rtl8211e rx/tx delay config") this has caused
      misconfiguration of the PHY, rendering the network unusable.
      
      This was worked around for ACPI by ignoring the phy-mode property but
      the system is also used with DT.  For DT instead if we're running on a
      SynQuacer force a working PHY mode, as well as the standard EDK2
      firmware with DT there are also some of these systems that use u-boot
      and might not initialise the PHY if not netbooting.  Newer firmware
      imagaes for at least EDK2 are available from Linaro so print a warning
      when doing this.
      
      Fixes: 533dd11a
      
       ("net: socionext: Add Synquacer NetSec driver")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230731-synquacer-net-v3-1-944be5f06428@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      19333322
    • Yuanjun Gong's avatar
      net: korina: handle clk prepare error in korina_probe() · 766c9dd0
      Yuanjun Gong authored
      [ Upstream commit 0b6291ad ]
      
      in korina_probe(), the return value of clk_prepare_enable()
      should be checked since it might fail. we can use
      devm_clk_get_optional_enabled() instead of devm_clk_get_optional()
      and clk_prepare_enable() to automatically handle the error.
      
      Fixes: e4cd854e
      
       ("net: korina: Get mdio input clock via common clock framework")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Link: https://lore.kernel.org/r/20230731090535.21416-1-ruc_gongyuanjun@163.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      766c9dd0
    • Dan Carpenter's avatar
      net: ll_temac: fix error checking of irq_of_parse_and_map() · 6cecfdf6
      Dan Carpenter authored
      [ Upstream commit ef45e840 ]
      
      Most kernel functions return negative error codes but some irq functions
      return zero on error.  In this code irq_of_parse_and_map(), returns zero
      and platform_get_irq() returns negative error codes.  We need to handle
      both cases appropriately.
      
      Fixes: 8425c41d
      
       ("net: ll_temac: Extend support to non-device-tree platforms")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Acked-by: default avatarEsben Haabendal <esben@geanix.com>
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarHarini Katakam <harini.katakam@amd.com>
      Link: https://lore.kernel.org/r/3d0aef75-06e0-45a5-a2a6-2cc4738d4143@moroto.mountain
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6cecfdf6
    • Yang Yingliang's avatar
      net: ll_temac: Switch to use dev_err_probe() helper · 3761ff4f
      Yang Yingliang authored
      [ Upstream commit 75ae8c28
      
       ]
      
      dev_err() can be replace with dev_err_probe() which will check if error
      code is -EPROBE_DEFER.
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: ef45e840
      
       ("net: ll_temac: fix error checking of irq_of_parse_and_map()")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3761ff4f
    • Tomas Glozar's avatar
      bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire · 5c534640
      Tomas Glozar authored
      [ Upstream commit 13d2618b ]
      
      Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC
      allocation later in sk_psock_init_link on PREEMPT_RT kernels, since
      GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist
      patchset notes for details).
      
      This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to
      BUG (sleeping function called from invalid context) on RT kernels.
      
      preempt_disable was introduced together with lock_sk and rcu_read_lock
      in commit 99ba2b5a
      
       ("bpf: sockhash, disallow bpf_tcp_close and update
      in parallel"), probably to match disabled migration of BPF programs, and
      is no longer necessary.
      
      Remove preempt_disable to fix BUG in sock_map_update_common on RT.
      
      Signed-off-by: default avatarTomas Glozar <tglozar@redhat.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/
      Fixes: 99ba2b5a
      
       ("bpf: sockhash, disallow bpf_tcp_close and update in parallel")
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20230728064411.305576-1-tglozar@redhat.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5c534640
    • valis's avatar
      net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free · 79c3d81c
      valis authored
      [ Upstream commit b80b829e ]
      
      When route4_change() is called on an existing filter, the whole
      tcf_result struct is always copied into the new instance of the filter.
      
      This causes a problem when updating a filter bound to a class,
      as tcf_unbind_filter() is always called on the old instance in the
      success path, decreasing filter_cnt of the still referenced class
      and allowing it to be deleted, leading to a use-after-free.
      
      Fix this by no longer copying the tcf_result struct from the old filter.
      
      Fixes: 1109c005
      
       ("net: sched: RCU cls_route")
      Reported-by: default avatarvalis <sec@valis.email>
      Reported-by: default avatarBing-Jhong Billy Jheng <billy@starlabs.sg>
      Signed-off-by: default avatarvalis <sec@valis.email>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      79c3d81c
    • valis's avatar
      net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free · 9edf7955
      valis authored
      [ Upstream commit 76e42ae8 ]
      
      When fw_change() is called on an existing filter, the whole
      tcf_result struct is always copied into the new instance of the filter.
      
      This causes a problem when updating a filter bound to a class,
      as tcf_unbind_filter() is always called on the old instance in the
      success path, decreasing filter_cnt of the still referenced class
      and allowing it to be deleted, leading to a use-after-free.
      
      Fix this by no longer copying the tcf_result struct from the old filter.
      
      Fixes: e35a8ee5
      
       ("net: sched: fw use RCU")
      Reported-by: default avatarvalis <sec@valis.email>
      Reported-by: default avatarBing-Jhong Billy Jheng <billy@starlabs.sg>
      Signed-off-by: default avatarvalis <sec@valis.email>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9edf7955
    • valis's avatar
      net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free · 262430df
      valis authored
      [ Upstream commit 3044b16e ]
      
      When u32_change() is called on an existing filter, the whole
      tcf_result struct is always copied into the new instance of the filter.
      
      This causes a problem when updating a filter bound to a class,
      as tcf_unbind_filter() is always called on the old instance in the
      success path, decreasing filter_cnt of the still referenced class
      and allowing it to be deleted, leading to a use-after-free.
      
      Fix this by no longer copying the tcf_result struct from the old filter.
      
      Fixes: de5df632
      
       ("net: sched: cls_u32 changes to knode must appear atomic to readers")
      Reported-by: default avatarvalis <sec@valis.email>
      Reported-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Signed-off-by: default avatarvalis <sec@valis.email>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      262430df
    • Hou Tao's avatar
      bpf, cpumap: Handle skb as well when clean up ptr_ring · b58d3406
      Hou Tao authored
      [ Upstream commit 7c62b75c ]
      
      The following warning was reported when running xdp_redirect_cpu with
      both skb-mode and stress-mode enabled:
      
        ------------[ cut here ]------------
        Incorrect XDP memory type (-2128176192) usage
        WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405
        Modules linked in:
        CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G  6.5.0-rc2+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
        Workqueue: events __cpu_map_entry_free
        RIP: 0010:__xdp_return+0x1e4/0x4a0
        ......
        Call Trace:
         <TASK>
         ? show_regs+0x65/0x70
         ? __warn+0xa5/0x240
         ? __xdp_return+0x1e4/0x4a0
         ......
         xdp_return_frame+0x4d/0x150
         __cpu_map_entry_free+0xf9/0x230
         process_one_work+0x6b0/0xb80
         worker_thread+0x96/0x720
         kthread+0x1a5/0x1f0
         ret_from_fork+0x3a/0x70
         ret_from_fork_asm+0x1b/0x30
         </TASK>
      
      The reason for the warning is twofold. One is due to the kthread
      cpu_map_kthread_run() is stopped prematurely. Another one is
      __cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in
      ptr_ring as XDP frames.
      
      Prematurely-stopped kthread will be fixed by the preceding patch and
      ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But
      as the comments in __cpu_map_ring_cleanup() said, handling and freeing
      skbs in ptr_ring as well to "catch any broken behaviour gracefully".
      
      Fixes: 11941f8a
      
       ("bpf: cpumap: Implement generic cpumap")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.com
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b58d3406