Skip to content
  1. Feb 01, 2024
    • Dan Carpenter's avatar
      drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking · 2a81e844
      Dan Carpenter authored
      commit 91443799 upstream.
      
      The i2c_master_send/recv() functions return negative error codes or the
      number of bytes that were able to be sent/received.  This code has
      two problems.  1)  Instead of checking if all the bytes were sent or
      received, it checks that at least one byte was sent or received.
      2) If there was a partial send/receive then we should return a negative
      error code but this code returns success.
      
      Fixes: a9fe713d
      
       ("drm/bridge: Add PTN3460 bridge driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarRobert Foss <rfoss@kernel.org>
      Signed-off-by: default avatarRobert Foss <rfoss@kernel.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/0cdc2dce-ca89-451a-9774-1482ab2f4762@moroto.mountain
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a81e844
    • Ville Syrjälä's avatar
      drm: Don't unref the same fb many times by mistake due to deadlock handling · 62f2e79c
      Ville Syrjälä authored
      commit cb4daf27
      
       upstream.
      
      If we get a deadlock after the fb lookup in drm_mode_page_flip_ioctl()
      we proceed to unref the fb and then retry the whole thing from the top.
      But we forget to reset the fb pointer back to NULL, and so if we then
      get another error during the retry, before the fb lookup, we proceed
      the unref the same fb again without having gotten another reference.
      The end result is that the fb will (eventually) end up being freed
      while it's still in use.
      
      Reset fb to NULL once we've unreffed it to avoid doing it again
      until we've done another fb lookup.
      
      This turned out to be pretty easy to hit on a DG2 when doing async
      flips (and CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y). The first symptom I
      saw that drm_closefb() simply got stuck in a busy loop while walking
      the framebuffer list. Fortunately I was able to convince it to oops
      instead, and from there it was easier to track down the culprit.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20231211081625.25704-1-ville.syrjala@linux.intel.com
      Acked-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62f2e79c
    • Rafael J. Wysocki's avatar
      cpufreq: intel_pstate: Refine computation of P-state for given frequency · 635e996e
      Rafael J. Wysocki authored
      commit 192cdb1c upstream.
      
      On systems using HWP, if a given frequency is equal to the maximum turbo
      frequency or the maximum non-turbo frequency, the HWP performance level
      corresponding to it is already known and can be used directly without
      any computation.
      
      Accordingly, adjust the code to use the known HWP performance levels in
      the cases mentioned above.
      
      This also helps to avoid limiting CPU capacity artificially in some
      cases when the BIOS produces the HWP_CAP numbers using a different
      E-core-to-P-core performance scaling factor than expected by the kernel.
      
      Fixes: f5c8cf2a
      
       ("cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores")
      Cc: 6.1+ <stable@vger.kernel.org> # 6.1+
      Tested-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      635e996e
    • Mario Limonciello's avatar
      gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04 · 242996f5
      Mario Limonciello authored
      commit 805c74ea
      
       upstream.
      
      Spurious wakeups are reported on the GPD G1619-04 which
      can be absolved by programming the GPIO to ignore wakeups.
      
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: default avatarGeorge Melikov <mail@gmelikov.ru>
      Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3073
      Signed-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      242996f5
    • Dave Chinner's avatar
      xfs: read only mounts with fsopen mount API are busted · 6c495c84
      Dave Chinner authored
      commit d8d222e0
      
       upstream.
      
      Recently xfs/513 started failing on my test machines testing "-o
      ro,norecovery" mount options. This was being emitted in dmesg:
      
      [ 9906.932724] XFS (pmem0): no-recovery mounts must be read-only.
      
      Turns out, readonly mounts with the fsopen()/fsconfig() mount API
      have been busted since day zero. It's only taken 5 years for debian
      unstable to start using this "new" mount API, and shortly after this
      I noticed xfs/513 had started to fail as per above.
      
      The syscall trace is:
      
      fsopen("xfs", FSOPEN_CLOEXEC)           = 3
      mount_setattr(-1, NULL, 0, NULL, 0)     = -1 EINVAL (Invalid argument)
      .....
      fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/pmem0", 0) = 0
      fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0
      fsconfig(3, FSCONFIG_SET_FLAG, "norecovery", NULL, 0) = 0
      fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = -1 EINVAL (Invalid argument)
      close(3)                                = 0
      
      Showing that the actual mount instantiation (FSCONFIG_CMD_CREATE) is
      what threw out the error.
      
      During mount instantiation, we call xfs_fs_validate_params() which
      does:
      
              /* No recovery flag requires a read-only mount */
              if (xfs_has_norecovery(mp) && !xfs_is_readonly(mp)) {
                      xfs_warn(mp, "no-recovery mounts must be read-only.");
                      return -EINVAL;
              }
      
      and xfs_is_readonly() checks internal mount flags for read only
      state. This state is set in xfs_init_fs_context() from the
      context superblock flag state:
      
              /*
               * Copy binary VFS mount flags we are interested in.
               */
              if (fc->sb_flags & SB_RDONLY)
                      set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate);
      
      With the old mount API, all of the VFS specific superblock flags
      had already been parsed and set before xfs_init_fs_context() is
      called, so this all works fine.
      
      However, in the brave new fsopen/fsconfig world,
      xfs_init_fs_context() is called from fsopen() context, before any
      VFS superblock have been set or parsed. Hence if we use fsopen(),
      the internal XFS readonly state is *never set*. Hence anything that
      depends on xfs_is_readonly() actually returning true for read only
      mounts is broken if fsopen() has been used to mount the filesystem.
      
      Fix this by moving this internal state initialisation to
      xfs_fs_fill_super() before we attempt to validate the parameters
      that have been set prior to the FSCONFIG_CMD_CREATE call being made.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Fixes: 73e5fff9
      
       ("xfs: switch to use the new mount-api")
      cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChandan Babu R <chandanbabu@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c495c84
    • Cristian Marussi's avatar
      firmware: arm_scmi: Check mailbox/SMT channel for consistency · 7f95f699
      Cristian Marussi authored
      commit 437a310b
      
       upstream.
      
      On reception of a completion interrupt the shared memory area is accessed
      to retrieve the message header at first and then, if the message sequence
      number identifies a transaction which is still pending, the related
      payload is fetched too.
      
      When an SCMI command times out the channel ownership remains with the
      platform until eventually a late reply is received and, as a consequence,
      any further transmission attempt remains pending, waiting for the channel
      to be relinquished by the platform.
      
      Once that late reply is received the channel ownership is given back
      to the agent and any pending request is then allowed to proceed and
      overwrite the SMT area of the just delivered late reply; then the wait
      for the reply to the new request starts.
      
      It has been observed that the spurious IRQ related to the late reply can
      be wrongly associated with the freshly enqueued request: when that happens
      the SCMI stack in-flight lookup procedure is fooled by the fact that the
      message header now present in the SMT area is related to the new pending
      transaction, even though the real reply has still to arrive.
      
      This race-condition on the A2P channel can be detected by looking at the
      channel status bits: a genuine reply from the platform will have set the
      channel free bit before triggering the completion IRQ.
      
      Add a consistency check to validate such condition in the A2P ISR.
      
      Reported-by: default avatarXinglong Yang <xinglong.yang@cixtech.com>
      Closes: https://lore.kernel.org/all/PUZPR06MB54981E6FA00D82BFDBB864FBF08DA@PUZPR06MB5498.apcprd06.prod.outlook.com/
      Fixes: 5c8a47a5
      
       ("firmware: arm_scmi: Make scmi core independent of the transport type")
      Cc: stable@vger.kernel.org # 5.15+
      Signed-off-by: default avatarCristian Marussi <cristian.marussi@arm.com>
      Tested-by: default avatarXinglong Yang <xinglong.yang@cixtech.com>
      Link: https://lore.kernel.org/r/20231220172112.763539-1-cristian.marussi@arm.com
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f95f699
    • Lin Ma's avatar
      ksmbd: fix global oob in ksmbd_nl_policy · 2c939c74
      Lin Ma authored
      commit ebeae8ad upstream.
      
      Similar to a reported issue (check the commit b33fb5b8 ("net:
      qualcomm: rmnet: fix global oob in rmnet_policy"), my local fuzzer finds
      another global out-of-bounds read for policy ksmbd_nl_policy. See bug
      trace below:
      
      ==================================================================
      BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:386 [inline]
      BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600
      Read of size 1 at addr ffffffff8f24b100 by task syz-executor.1/62810
      
      CPU: 0 PID: 62810 Comm: syz-executor.1 Tainted: G                 N 6.1.0 #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x8b/0xb3 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:284 [inline]
       print_report+0x172/0x475 mm/kasan/report.c:395
       kasan_report+0xbb/0x1c0 mm/kasan/report.c:495
       validate_nla lib/nlattr.c:386 [inline]
       __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600
       __nla_parse+0x3e/0x50 lib/nlattr.c:697
       __nlmsg_parse include/net/netlink.h:748 [inline]
       genl_family_rcv_msg_attrs_parse.constprop.0+0x1b0/0x290 net/netlink/genetlink.c:565
       genl_family_rcv_msg_doit+0xda/0x330 net/netlink/genetlink.c:734
       genl_family_rcv_msg net/netlink/genetlink.c:833 [inline]
       genl_rcv_msg+0x441/0x780 net/netlink/genetlink.c:850
       netlink_rcv_skb+0x14f/0x410 net/netlink/af_netlink.c:2540
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:861
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x930/0xe50 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg+0x154/0x190 net/socket.c:734
       ____sys_sendmsg+0x6df/0x840 net/socket.c:2482
       ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
       __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7fdd66a8f359
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fdd65e00168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007fdd66bbcf80 RCX: 00007fdd66a8f359
      RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000003
      RBP: 00007fdd66ada493 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffc84b81aff R14: 00007fdd65e00300 R15: 0000000000022000
       </TASK>
      
      The buggy address belongs to the variable:
       ksmbd_nl_policy+0x100/0xa80
      
      The buggy address belongs to the physical page:
      page:0000000034f47940 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1ccc4b
      flags: 0x200000000001000(reserved|node=0|zone=2)
      raw: 0200000000001000 ffffea00073312c8 ffffea00073312c8 0000000000000000
      raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffffffff8f24b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffffffff8f24b080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffffffff8f24b100: f9 f9 f9 f9 00 00 f9 f9 f9 f9 f9 f9 00 00 07 f9
                         ^
       ffffffff8f24b180: f9 f9 f9 f9 00 05 f9 f9 f9 f9 f9 f9 00 00 00 05
       ffffffff8f24b200: f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9 00 00 04 f9
      ==================================================================
      
      To fix it, add a placeholder named __KSMBD_EVENT_MAX and let
      KSMBD_EVENT_MAX to be its original value - 1 according to what other
      netlink families do. Also change two sites that refer the
      KSMBD_EVENT_MAX to correct value.
      
      Cc: stable@vger.kernel.org
      Fixes: 0626e664
      
       ("cifsd: add server handler for central processing and tranport layers")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Acked-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c939c74
    • Shin'ichiro Kawasaki's avatar
      platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe · 2841631a
      Shin'ichiro Kawasaki authored
      commit 5913320e upstream.
      
      p2sb_bar() unhides P2SB device to get resources from the device. It
      guards the operation by locking pci_rescan_remove_lock so that parallel
      rescans do not find the P2SB device. However, this lock causes deadlock
      when PCI bus rescan is triggered by /sys/bus/pci/rescan. The rescan
      locks pci_rescan_remove_lock and probes PCI devices. When PCI devices
      call p2sb_bar() during probe, it locks pci_rescan_remove_lock again.
      Hence the deadlock.
      
      To avoid the deadlock, do not lock pci_rescan_remove_lock in p2sb_bar().
      Instead, do the lock at fs_initcall. Introduce p2sb_cache_resources()
      for fs_initcall which gets and caches the P2SB resources. At p2sb_bar(),
      refer the cache and return to the caller.
      
      Before operating the device at P2SB DEVFN for resource cache, check
      that its device class is PCI_CLASS_MEMORY_OTHER 0x0580 that PCH
      specifications define. This avoids unexpected operation to other devices
      at the same DEVFN.
      
      Link: https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/
      Fixes: 9745fb07
      
       ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Link: https://lore.kernel.org/r/20240108062059.3583028-2-shinichiro.kawasaki@wdc.com
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Tested-by Klara Modin <klarasmodin@gmail.com>
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2841631a
    • Florian Westphal's avatar
      netfilter: nf_tables: reject QUEUE/DROP verdict parameters · 8e34430e
      Florian Westphal authored
      commit f342de4e upstream.
      
      This reverts commit e0abdadc.
      
      core.c:nf_hook_slow assumes that the upper 16 bits of NF_DROP
      verdicts contain a valid errno, i.e. -EPERM, -EHOSTUNREACH or similar,
      or 0.
      
      Due to the reverted commit, its possible to provide a positive
      value, e.g. NF_ACCEPT (1), which results in use-after-free.
      
      Its not clear to me why this commit was made.
      
      NF_QUEUE is not used by nftables; "queue" rules in nftables
      will result in use of "nft_queue" expression.
      
      If we later need to allow specifiying errno values from userspace
      (do not know why), this has to call NF_DROP_GETERR and check that
      "err <= 0" holds true.
      
      Fixes: e0abdadc
      
       ("netfilter: nf_tables: accept QUEUE/DROP verdict parameters")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarNotselwyn <notselwyn@pwning.tech>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e34430e
    • Pablo Neira Ayuso's avatar
      netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain · af149a46
      Pablo Neira Ayuso authored
      commit 01acb2e8 upstream.
      
      Remove netdevice from inet/ingress basechain in case NETDEV_UNREGISTER
      event is reported, otherwise a stale reference to netdevice remains in
      the hook list.
      
      Fixes: 60a3815d
      
       ("netfilter: add inet ingress support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af149a46
    • Michael Kelley's avatar
      hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes · 5e7d8ddf
      Michael Kelley authored
      commit 6941f67a upstream.
      
      Current code in netvsc_drv_init() incorrectly assumes that PAGE_SIZE
      is 4 Kbytes, which is wrong on ARM64 with 16K or 64K page size. As a
      result, the default VMBus ring buffer size on ARM64 with 64K page size
      is 8 Mbytes instead of the expected 512 Kbytes. While this doesn't break
      anything, a typical VM with 8 vCPUs and 8 netvsc channels wastes 120
      Mbytes (8 channels * 2 ring buffers/channel * 7.5 Mbytes/ring buffer).
      
      Unfortunately, the module parameter specifying the ring buffer size
      is in units of 4 Kbyte pages. Ideally, it should be in units that
      are independent of PAGE_SIZE, but backwards compatibility prevents
      changing that now.
      
      Fix this by having netvsc_drv_init() hardcode 4096 instead of using
      PAGE_SIZE when calculating the ring buffer size in bytes. Also
      use the VMBUS_RING_SIZE macro to ensure proper alignment when running
      with page size larger than 4K.
      
      Cc: <stable@vger.kernel.org> # 5.15.x
      Fixes: 7aff79e2
      
       ("Drivers: hv: Enable Hyper-V code to be built on ARM64")
      Signed-off-by: default avatarMichael Kelley <mhklinux@outlook.com>
      Link: https://lore.kernel.org/r/20240122162028.348885-1-mhklinux@outlook.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e7d8ddf
    • Emmanuel Grumbach's avatar
      wifi: iwlwifi: fix a memory corruption · aa2cc936
      Emmanuel Grumbach authored
      commit cf4a0d84 upstream.
      
      iwl_fw_ini_trigger_tlv::data is a pointer to a __le32, which means that
      if we copy to iwl_fw_ini_trigger_tlv::data + offset while offset is in
      bytes, we'll write past the buffer.
      
      Cc: stable@vger.kernel.org
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218233
      Fixes: cf29c5b6
      
       ("iwlwifi: dbg_ini: implement time point handling")
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarMiri Korenblit <miriam.rachel.korenblit@intel.com>
      Link: https://msgid.link/20240111150610.2d2b8b870194.I14ed76505a5cf87304e0c9cc05cc0ae85ed3bf91@changeid
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa2cc936
    • Bernd Edlinger's avatar
      exec: Fix error handling in begin_new_exec() · dcc54a54
      Bernd Edlinger authored
      commit 84c39ec5 upstream.
      
      If get_unused_fd_flags() fails, the error handling is incomplete because
      bprm->cred is already set to NULL, and therefore free_bprm will not
      unlock the cred_guard_mutex. Note there are two error conditions which
      end up here, one before and one after bprm->cred is cleared.
      
      Fixes: b8a61c9e
      
       ("exec: Generic execfd support")
      Signed-off-by: default avatarBernd Edlinger <bernd.edlinger@hotmail.de>
      Acked-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Link: https://lore.kernel.org/r/AS8P193MB128517ADB5EFF29E04389EDAE4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcc54a54
    • Ilya Dryomov's avatar
      rbd: don't move requests to the running list on errors · 46464457
      Ilya Dryomov authored
      commit ded080c8 upstream.
      
      The running list is supposed to contain requests that are pinning the
      exclusive lock, i.e. those that must be flushed before exclusive lock
      is released.  When wake_lock_waiters() is called to handle an error,
      requests on the acquiring list are failed with that error and no
      flushing takes place.  Briefly moving them to the running list is not
      only pointless but also harmful: if exclusive lock gets acquired
      before all of their state machines are scheduled and go through
      rbd_lock_del_request(), we trigger
      
          rbd_assert(list_empty(&rbd_dev->running_list));
      
      in rbd_try_acquire_lock().
      
      Cc: stable@vger.kernel.org
      Fixes: 637cd060
      
       ("rbd: new exclusive lock wait/wake code")
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46464457
    • Omar Sandoval's avatar
      btrfs: don't abort filesystem when attempting to snapshot deleted subvolume · 6e6bca99
      Omar Sandoval authored
      commit 7081929a
      
       upstream.
      
      If the source file descriptor to the snapshot ioctl refers to a deleted
      subvolume, we get the following abort:
      
        BTRFS: Transaction aborted (error -2)
        WARNING: CPU: 0 PID: 833 at fs/btrfs/transaction.c:1875 create_pending_snapshot+0x1040/0x1190 [btrfs]
        Modules linked in: pata_acpi btrfs ata_piix libata scsi_mod virtio_net blake2b_generic xor net_failover virtio_rng failover scsi_common rng_core raid6_pq libcrc32c
        CPU: 0 PID: 833 Comm: t_snapshot_dele Not tainted 6.7.0-rc6 #2
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
        RIP: 0010:create_pending_snapshot+0x1040/0x1190 [btrfs]
        RSP: 0018:ffffa09c01337af8 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffff9982053e7c78 RCX: 0000000000000027
        RDX: ffff99827dc20848 RSI: 0000000000000001 RDI: ffff99827dc20840
        RBP: ffffa09c01337c00 R08: 0000000000000000 R09: ffffa09c01337998
        R10: 0000000000000003 R11: ffffffffb96da248 R12: fffffffffffffffe
        R13: ffff99820535bb28 R14: ffff99820b7bd000 R15: ffff99820381ea80
        FS:  00007fe20aadabc0(0000) GS:ffff99827dc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000559a120b502f CR3: 00000000055b6000 CR4: 00000000000006f0
        Call Trace:
         <TASK>
         ? create_pending_snapshot+0x1040/0x1190 [btrfs]
         ? __warn+0x81/0x130
         ? create_pending_snapshot+0x1040/0x1190 [btrfs]
         ? report_bug+0x171/0x1a0
         ? handle_bug+0x3a/0x70
         ? exc_invalid_op+0x17/0x70
         ? asm_exc_invalid_op+0x1a/0x20
         ? create_pending_snapshot+0x1040/0x1190 [btrfs]
         ? create_pending_snapshot+0x1040/0x1190 [btrfs]
         create_pending_snapshots+0x92/0xc0 [btrfs]
         btrfs_commit_transaction+0x66b/0xf40 [btrfs]
         btrfs_mksubvol+0x301/0x4d0 [btrfs]
         btrfs_mksnapshot+0x80/0xb0 [btrfs]
         __btrfs_ioctl_snap_create+0x1c2/0x1d0 [btrfs]
         btrfs_ioctl_snap_create_v2+0xc4/0x150 [btrfs]
         btrfs_ioctl+0x8a6/0x2650 [btrfs]
         ? kmem_cache_free+0x22/0x340
         ? do_sys_openat2+0x97/0xe0
         __x64_sys_ioctl+0x97/0xd0
         do_syscall_64+0x46/0xf0
         entry_SYSCALL_64_after_hwframe+0x6e/0x76
        RIP: 0033:0x7fe20abe83af
        RSP: 002b:00007ffe6eff1360 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe20abe83af
        RDX: 00007ffe6eff23c0 RSI: 0000000050009417 RDI: 0000000000000003
        RBP: 0000000000000003 R08: 0000000000000000 R09: 00007fe20ad16cd0
        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
        R13: 00007ffe6eff13c0 R14: 00007fe20ad45000 R15: 0000559a120b6d58
         </TASK>
        ---[ end trace 0000000000000000 ]---
        BTRFS: error (device vdc: state A) in create_pending_snapshot:1875: errno=-2 No such entry
        BTRFS info (device vdc: state EA): forced readonly
        BTRFS warning (device vdc: state EA): Skipping commit of aborted transaction.
        BTRFS: error (device vdc: state EA) in cleanup_transaction:2055: errno=-2 No such entry
      
      This happens because create_pending_snapshot() initializes the new root
      item as a copy of the source root item. This includes the refs field,
      which is 0 for a deleted subvolume. The call to btrfs_insert_root()
      therefore inserts a root with refs == 0. btrfs_get_new_fs_root() then
      finds the root and returns -ENOENT if refs == 0, which causes
      create_pending_snapshot() to abort.
      
      Fix it by checking the source root's refs before attempting the
      snapshot, but after locking subvol_sem to avoid racing with deletion.
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e6bca99
    • Qu Wenruo's avatar
      btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args · 52e02f26
      Qu Wenruo authored
      commit 173431b2
      
       upstream.
      
      Add extra sanity check for btrfs_ioctl_defrag_range_args::flags.
      
      This is not really to enhance fuzzing tests, but as a preparation for
      future expansion on btrfs_ioctl_defrag_range_args.
      
      In the future we're going to add new members, allowing more fine tuning
      for btrfs defrag.  Without the -ENONOTSUPP error, there would be no way
      to detect if the kernel supports those new defrag features.
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52e02f26
    • David Sterba's avatar
      btrfs: don't warn if discard range is not aligned to sector · 86aff7c5
      David Sterba authored
      commit a208b3f1 upstream.
      
      There's a warning in btrfs_issue_discard() when the range is not aligned
      to 512 bytes, originally added in 4d89d377
      
       ("btrfs:
      btrfs_issue_discard ensure offset/length are aligned to sector
      boundaries"). We can't do sub-sector writes anyway so the adjustment is
      the only thing that we can do and the warning is unnecessary.
      
      CC: stable@vger.kernel.org # 4.19+
      Reported-by: default avatar <syzbot+4a4f1eba14eb5c3417d1@syzkaller.appspotmail.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86aff7c5
    • Chung-Chiang Cheng's avatar
      btrfs: tree-checker: fix inline ref size in error messages · b60f748a
      Chung-Chiang Cheng authored
      commit f398e70d upstream.
      
      The error message should accurately reflect the size rather than the
      type.
      
      Fixes: f82d1c7c
      
       ("btrfs: tree-checker: Add EXTENT_ITEM and METADATA_ITEM check")
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarChung-Chiang Cheng <cccheng@synology.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b60f748a
    • Fedor Pchelkin's avatar
      btrfs: ref-verify: free ref cache before clearing mount opt · c91c247b
      Fedor Pchelkin authored
      commit f03e274a
      
       upstream.
      
      As clearing REF_VERIFY mount option indicates there were some errors in a
      ref-verify process, a ref cache is not relevant anymore and should be
      freed.
      
      btrfs_free_ref_cache() requires REF_VERIFY option being set so call
      it just before clearing the mount option.
      
      Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
      
      Reported-by: default avatar <syzbot+be14ed7728594dc8bd42@syzkaller.appspotmail.com>
      Fixes: fd708b81
      
       ("Btrfs: add a extent ref verify tool")
      CC: stable@vger.kernel.org # 5.4+
      Closes: https://lore.kernel.org/lkml/000000000000e5a65c05ee832054@google.com/
      Reported-by: default avatar <syzbot+c563a3c79927971f950f@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/lkml/0000000000007fe09705fdc6086c@google.com/
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarFedor Pchelkin <pchelkin@ispras.ru>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c91c247b
    • Omar Sandoval's avatar
      btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted · 9ebd514f
      Omar Sandoval authored
      commit 3324d054
      
       upstream.
      
      Sweet Tea spotted a race between subvolume deletion and snapshotting
      that can result in the root item for the snapshot having the
      BTRFS_ROOT_SUBVOL_DEAD flag set. The race is:
      
      Thread 1                                      | Thread 2
      ----------------------------------------------|----------
      btrfs_delete_subvolume                        |
        btrfs_set_root_flags(BTRFS_ROOT_SUBVOL_DEAD)|
                                                    |btrfs_mksubvol
                                                    |  down_read(subvol_sem)
                                                    |  create_snapshot
                                                    |    ...
                                                    |    create_pending_snapshot
                                                    |      copy root item from source
        down_write(subvol_sem)                      |
      
      This flag is only checked in send and swap activate, which this would
      cause to fail mysteriously.
      
      create_snapshot() now checks the root refs to reject a deleted
      subvolume, so we can fix this by locking subvol_sem earlier so that the
      BTRFS_ROOT_SUBVOL_DEAD flag and the root refs are updated atomically.
      
      CC: stable@vger.kernel.org # 4.14+
      Reported-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Reviewed-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ebd514f
    • Eric Dumazet's avatar
      nbd: always initialize struct msghdr completely · d9c54763
      Eric Dumazet authored
      commit 78fbb92a upstream.
      
      syzbot complains that msg->msg_get_inq value can be uninitialized [1]
      
      struct msghdr got many new fields recently, we should always make
      sure their values is zero by default.
      
      [1]
       BUG: KMSAN: uninit-value in tcp_recvmsg+0x686/0xac0 net/ipv4/tcp.c:2571
        tcp_recvmsg+0x686/0xac0 net/ipv4/tcp.c:2571
        inet_recvmsg+0x131/0x580 net/ipv4/af_inet.c:879
        sock_recvmsg_nosec net/socket.c:1044 [inline]
        sock_recvmsg+0x12b/0x1e0 net/socket.c:1066
        __sock_xmit+0x236/0x5c0 drivers/block/nbd.c:538
        nbd_read_reply drivers/block/nbd.c:732 [inline]
        recv_work+0x262/0x3100 drivers/block/nbd.c:863
        process_one_work kernel/workqueue.c:2627 [inline]
        process_scheduled_works+0x104e/0x1e70 kernel/workqueue.c:2700
        worker_thread+0xf45/0x1490 kernel/workqueue.c:2781
        kthread+0x3ed/0x540 kernel/kthread.c:388
        ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
      
      Local variable msg created at:
        __sock_xmit+0x4c/0x5c0 drivers/block/nbd.c:513
        nbd_read_reply drivers/block/nbd.c:732 [inline]
        recv_work+0x262/0x3100 drivers/block/nbd.c:863
      
      CPU: 1 PID: 7465 Comm: kworker/u5:1 Not tainted 6.7.0-rc7-syzkaller-00041-gf016f7547aee #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      Workqueue: nbd5-recv recv_work
      
      Fixes: f94fd25c
      
       ("tcp: pass back data left in socket after receive")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: stable@vger.kernel.org
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: nbd@other.debian.org
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240112132657.647112-1-edumazet@google.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9c54763
    • Shenwei Wang's avatar
      net: fec: fix the unhandled context fault from smmu · 0a5a083c
      Shenwei Wang authored
      [ Upstream commit 5e344807 ]
      
      When repeatedly changing the interface link speed using the command below:
      
      ethtool -s eth0 speed 100 duplex full
      ethtool -s eth0 speed 1000 duplex full
      
      The following errors may sometimes be reported by the ARM SMMU driver:
      
      [ 5395.035364] fec 5b040000.ethernet eth0: Link is Down
      [ 5395.039255] arm-smmu 51400000.iommu: Unhandled context fault:
      fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2
      [ 5398.108460] fec 5b040000.ethernet eth0: Link is Up - 100Mbps/Full -
      flow control off
      
      It is identified that the FEC driver does not properly stop the TX queue
      during the link speed transitions, and this results in the invalid virtual
      I/O address translations from the SMMU and causes the context faults.
      
      Fixes: dbc64a8e
      
       ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()")
      Signed-off-by: default avatarShenwei Wang <shenwei.wang@nxp.com>
      Link: https://lore.kernel.org/r/20240123165141.2008104-1-shenwei.wang@nxp.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0a5a083c
    • Zhipeng Lu's avatar
      fjes: fix memleaks in fjes_hw_setup · 5b1086d2
      Zhipeng Lu authored
      [ Upstream commit f6cc4b6a ]
      
      In fjes_hw_setup, it allocates several memory and delay the deallocation
      to the fjes_hw_exit in fjes_probe through the following call chain:
      
      fjes_probe
        |-> fjes_hw_init
              |-> fjes_hw_setup
        |-> fjes_hw_exit
      
      However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus
      all the resources allocated in fjes_hw_setup will be leaked. In this
      patch, we free those resources in fjes_hw_setup and prevents such leaks.
      
      Fixes: 2fcbca68
      
       ("fjes: platform_driver's .probe and .remove routine")
      Signed-off-by: default avatarZhipeng Lu <alexious@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5b1086d2
    • Jakub Kicinski's avatar
      selftests: netdevsim: fix the udp_tunnel_nic test · 4b4dcb3f
      Jakub Kicinski authored
      [ Upstream commit 0879020a ]
      
      This test is missing a whole bunch of checks for interface
      renaming and one ifup. Presumably it was only used on a system
      with renaming disabled and NetworkManager running.
      
      Fixes: 91f430b2
      
       ("selftests: net: add a test for UDP tunnel info infra")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240123060529.1033912-1-kuba@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4b4dcb3f
    • Jenishkumar Maheshbhai Patel's avatar
      net: mvpp2: clear BM pool before initialization · cec65f09
      Jenishkumar Maheshbhai Patel authored
      [ Upstream commit 9f538b41 ]
      
      Register value persist after booting the kernel using
      kexec which results in kernel panic. Thus clear the
      BM pool registers before initialisation to fix the issue.
      
      Fixes: 3f518509
      
       ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: default avatarJenishkumar Maheshbhai Patel <jpatel2@marvell.com>
      Reviewed-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Link: https://lore.kernel.org/r/20240119035914.25956650
      
      -1-jpatel2@marvell.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cec65f09
    • Bernd Edlinger's avatar
      net: stmmac: Wait a bit for the reset to take effect · acb6eaf2
      Bernd Edlinger authored
      [ Upstream commit a5f5eee2 ]
      
      otherwise the synopsys_id value may be read out wrong,
      because the GMAC_VERSION register might still be in reset
      state, for at least 1 us after the reset is de-asserted.
      
      Add a wait for 10 us before continuing to be on the safe side.
      
      > From what have you got that delay value?
      
      Just try and error, with very old linux versions and old gcc versions
      the synopsys_id was read out correctly most of the time (but not always),
      with recent linux versions and recnet gcc versions it was read out
      wrongly most of the time, but again not always.
      I don't have access to the VHDL code in question, so I cannot
      tell why it takes so long to get the correct values, I also do not
      have more than a few hardware samples, so I cannot tell how long
      this timeout must be in worst case.
      Experimentally I can tell that the register is read several times
      as zero immediately after the reset is de-asserted, also adding several
      no-ops is not enough, adding a printk is enough, also udelay(1) seems to
      be enough but I tried that not very often, and I have not access to many
      hardware samples to be 100% sure about the necessary delay.
      And since the udelay here is only executed once per device instance,
      it seems acceptable to delay the boot for 10 us.
      
      BTW: my hardware's synopsys id is 0x37.
      
      Fixes: c5e4ddbd
      
       ("net: stmmac: Add support for optional reset control")
      Signed-off-by: default avatarBernd Edlinger <bernd.edlinger@hotmail.de>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Link: https://lore.kernel.org/r/AS8P193MB1285A810BD78C111E7F6AA34E4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      acb6eaf2
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: validate NFPROTO_* family · 67ee3736
      Pablo Neira Ayuso authored
      [ Upstream commit d0009eff ]
      
      Several expressions explicitly refer to NF_INET_* hook definitions
      from expr->ops->validate, however, family is not validated.
      
      Bail out with EOPNOTSUPP in case they are used from unsupported
      families.
      
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Fixes: a3c90f7a ("netfilter: nf_tables: flow offload expression")
      Fixes: 2fa84193 ("netfilter: nf_tables: introduce routing expression")
      Fixes: 554ced0a ("netfilter: nf_tables: add support for native socket matching")
      Fixes: ad49d86e ("netfilter: nf_tables: Add synproxy support")
      Fixes: 4ed8eb65 ("netfilter: nf_tables: Add native tproxy support")
      Fixes: 6c472602
      
       ("netfilter: nf_tables: add xfrm expression")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      67ee3736
    • Florian Westphal's avatar
      netfilter: nf_tables: restrict anonymous set and map names to 16 bytes · ed5b62bb
      Florian Westphal authored
      [ Upstream commit b462579b ]
      
      nftables has two types of sets/maps, one where userspace defines the
      name, and anonymous sets/maps, where userspace defines a template name.
      
      For the latter, kernel requires presence of exactly one "%d".
      nftables uses "__set%d" and "__map%d" for this.  The kernel will
      expand the format specifier and replaces it with the smallest unused
      number.
      
      As-is, userspace could define a template name that allows to move
      the set name past the 256 bytes upperlimit (post-expansion).
      
      I don't see how this could be a problem, but I would prefer if userspace
      cannot do this, so add a limit of 16 bytes for the '%d' template name.
      
      16 bytes is the old total upper limit for set names that existed when
      nf_tables was merged initially.
      
      Fixes: 38745490
      
       ("netfilter: nf_tables: Allow set names of up to 255 chars")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ed5b62bb
    • Filipe Manana's avatar
      btrfs: fix race between reading a directory and adding entries to it · c25d7922
      Filipe Manana authored
      commit 8e7f82de
      
       upstream.
      
      When opening a directory (opendir(3)) or rewinding it (rewinddir(3)), we
      are not holding the directory's inode locked, and this can result in later
      attempting to add two entries to the directory with the same index number,
      resulting in a transaction abort, with -EEXIST (-17), when inserting the
      second delayed dir index. This results in a trace like the following:
      
        Sep 11 22:34:59 myhostname kernel: BTRFS error (device dm-3): err add delayed dir index item(name: cockroach-stderr.log) into the insertion tree of the delayed node(root id: 5, inode id: 4539217, errno: -17)
        Sep 11 22:34:59 myhostname kernel: ------------[ cut here ]------------
        Sep 11 22:34:59 myhostname kernel: kernel BUG at fs/btrfs/delayed-inode.c:1504!
        Sep 11 22:34:59 myhostname kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        Sep 11 22:34:59 myhostname kernel: CPU: 0 PID: 7159 Comm: cockroach Not tainted 6.4.15-200.fc38.x86_64 #1
        Sep 11 22:34:59 myhostname kernel: Hardware name: ASUS ESC500 G3/P9D WS, BIOS 2402 06/27/2018
        Sep 11 22:34:59 myhostname kernel: RIP: 0010:btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel: Code: eb dd 48 (...)
        Sep 11 22:34:59 myhostname kernel: RSP: 0000:ffffa9980e0fbb28 EFLAGS: 00010282
        Sep 11 22:34:59 myhostname kernel: RAX: 0000000000000000 RBX: ffff8b10b8f4a3c0 RCX: 0000000000000000
        Sep 11 22:34:59 myhostname kernel: RDX: 0000000000000000 RSI: ffff8b177ec21540 RDI: ffff8b177ec21540
        Sep 11 22:34:59 myhostname kernel: RBP: ffff8b110cf80888 R08: 0000000000000000 R09: ffffa9980e0fb938
        Sep 11 22:34:59 myhostname kernel: R10: 0000000000000003 R11: ffffffff86146508 R12: 0000000000000014
        Sep 11 22:34:59 myhostname kernel: R13: ffff8b1131ae5b40 R14: ffff8b10b8f4a418 R15: 00000000ffffffef
        Sep 11 22:34:59 myhostname kernel: FS:  00007fb14a7fe6c0(0000) GS:ffff8b177ec00000(0000) knlGS:0000000000000000
        Sep 11 22:34:59 myhostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        Sep 11 22:34:59 myhostname kernel: CR2: 000000c00143d000 CR3: 00000001b3b4e002 CR4: 00000000001706f0
        Sep 11 22:34:59 myhostname kernel: Call Trace:
        Sep 11 22:34:59 myhostname kernel:  <TASK>
        Sep 11 22:34:59 myhostname kernel:  ? die+0x36/0x90
        Sep 11 22:34:59 myhostname kernel:  ? do_trap+0xda/0x100
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  ? do_error_trap+0x6a/0x90
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  ? exc_invalid_op+0x50/0x70
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  ? asm_exc_invalid_op+0x1a/0x20
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  btrfs_insert_dir_item+0x200/0x280
        Sep 11 22:34:59 myhostname kernel:  btrfs_add_link+0xab/0x4f0
        Sep 11 22:34:59 myhostname kernel:  ? ktime_get_real_ts64+0x47/0xe0
        Sep 11 22:34:59 myhostname kernel:  btrfs_create_new_inode+0x7cd/0xa80
        Sep 11 22:34:59 myhostname kernel:  btrfs_symlink+0x190/0x4d0
        Sep 11 22:34:59 myhostname kernel:  ? schedule+0x5e/0xd0
        Sep 11 22:34:59 myhostname kernel:  ? __d_lookup+0x7e/0xc0
        Sep 11 22:34:59 myhostname kernel:  vfs_symlink+0x148/0x1e0
        Sep 11 22:34:59 myhostname kernel:  do_symlinkat+0x130/0x140
        Sep 11 22:34:59 myhostname kernel:  __x64_sys_symlinkat+0x3d/0x50
        Sep 11 22:34:59 myhostname kernel:  do_syscall_64+0x5d/0x90
        Sep 11 22:34:59 myhostname kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
        Sep 11 22:34:59 myhostname kernel:  ? do_syscall_64+0x6c/0x90
        Sep 11 22:34:59 myhostname kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      The race leading to the problem happens like this:
      
      1) Directory inode X is loaded into memory, its ->index_cnt field is
         initialized to (u64)-1 (at btrfs_alloc_inode());
      
      2) Task A is adding a new file to directory X, holding its vfs inode lock,
         and calls btrfs_set_inode_index() to get an index number for the entry.
      
         Because the inode's index_cnt field is set to (u64)-1 it calls
         btrfs_inode_delayed_dir_index_count() which fails because no dir index
         entries were added yet to the delayed inode and then it calls
         btrfs_set_inode_index_count(). This functions finds the last dir index
         key and then sets index_cnt to that index value + 1. It found that the
         last index key has an offset of 100. However before it assigns a value
         of 101 to index_cnt...
      
      3) Task B calls opendir(3), ending up at btrfs_opendir(), where the VFS
         lock for inode X is not taken, so it calls btrfs_get_dir_last_index()
         and sees index_cnt still with a value of (u64)-1. Because of that it
         calls btrfs_inode_delayed_dir_index_count() which fails since no dir
         index entries were added to the delayed inode yet, and then it also
         calls btrfs_set_inode_index_count(). This also finds that the last
         index key has an offset of 100, and before it assigns the value 101
         to the index_cnt field of inode X...
      
      4) Task A assigns a value of 101 to index_cnt. And then the code flow
         goes to btrfs_set_inode_index() where it increments index_cnt from
         101 to 102. Task A then creates a delayed dir index entry with a
         sequence number of 101 and adds it to the delayed inode;
      
      5) Task B assigns 101 to the index_cnt field of inode X;
      
      6) At some later point when someone tries to add a new entry to the
         directory, btrfs_set_inode_index() will return 101 again and shortly
         after an attempt to add another delayed dir index key with index
         number 101 will fail with -EEXIST resulting in a transaction abort.
      
      Fix this by locking the inode at btrfs_get_dir_last_index(), which is only
      only used when opening a directory or attempting to lseek on it.
      
      Reported-by: default avatarken <ken@bllue.org>
      Link: https://lore.kernel.org/linux-btrfs/CAE6xmH+Lp=Q=E61bU+v9eWX8gYfLvu6jLYxjxjFpo3zHVPR0EQ@mail.gmail.com/
      Reported-by: default avatar <syzbot+d13490c82ad5353c779d@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
      Fixes: 9b378f6a
      
       ("btrfs: fix infinite directory reads")
      CC: stable@vger.kernel.org # 6.5+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c25d7922
    • Filipe Manana's avatar
      btrfs: refresh dir last index during a rewinddir(3) call · fd968e68
      Filipe Manana authored
      commit e60aa5da upstream.
      
      When opening a directory we find what's the index of its last entry and
      then store it in the directory's file handle private data (struct
      btrfs_file_private::last_index), so that in the case new directory entries
      are added to a directory after an opendir(3) call we don't end up in an
      infinite loop (see commit 9b378f6a
      
       ("btrfs: fix infinite directory
      reads")) when calling readdir(3).
      
      However once rewinddir(3) is called, POSIX states [1] that any new
      directory entries added after the previous opendir(3) call, must be
      returned by subsequent calls to readdir(3):
      
        "The rewinddir() function shall reset the position of the directory
         stream to which dirp refers to the beginning of the directory.
         It shall also cause the directory stream to refer to the current
         state of the corresponding directory, as a call to opendir() would
         have done."
      
      We currently don't refresh the last_index field of the struct
      btrfs_file_private associated to the directory, so after a rewinddir(3)
      we are not returning any new entries added after the opendir(3) call.
      
      Fix this by finding the current last index of the directory when llseek
      is called against the directory.
      
      This can be reproduced by the following C program provided by Ian Johnson:
      
         #include <dirent.h>
         #include <stdio.h>
      
         int main(void) {
           DIR *dir = opendir("test");
      
           FILE *file;
           file = fopen("test/1", "w");
           fwrite("1", 1, 1, file);
           fclose(file);
      
           file = fopen("test/2", "w");
           fwrite("2", 1, 1, file);
           fclose(file);
      
           rewinddir(dir);
      
           struct dirent *entry;
           while ((entry = readdir(dir))) {
              printf("%s\n", entry->d_name);
           }
           closedir(dir);
           return 0;
         }
      
      Reported-by: default avatarIan Johnson <ian@ianjohnson.dev>
      Link: https://lore.kernel.org/linux-btrfs/YR1P0S.NGASEG570GJ8@ianjohnson.dev/
      Fixes: 9b378f6a
      
       ("btrfs: fix infinite directory reads")
      CC: stable@vger.kernel.org # 6.5+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd968e68
    • Filipe Manana's avatar
      btrfs: set last dir index to the current last index when opening dir · a045b6b1
      Filipe Manana authored
      commit 35795036 upstream.
      
      When opening a directory for reading it, we set the last index where we
      stop iteration to the value in struct btrfs_inode::index_cnt. That value
      does not match the index of the most recently added directory entry but
      it's instead the index number that will be assigned the next directory
      entry.
      
      This means that if after the call to opendir(3) new directory entries are
      added, a readdir(3) call will return the first new directory entry. This
      is fine because POSIX says the following [1]:
      
        "If a file is removed from or added to the directory after the most
         recent call to opendir() or rewinddir(), whether a subsequent call to
         readdir() returns an entry for that file is unspecified."
      
      For example for the test script from commit 9b378f6a
      
       ("btrfs: fix
      infinite directory reads"), where we have 2000 files in a directory, ext4
      doesn't return any new directory entry after opendir(3), while xfs returns
      the first 13 new directory entries added after the opendir(3) call.
      
      If we move to a shorter example with an empty directory when opendir(3) is
      called, and 2 files added to the directory after the opendir(3) call, then
      readdir(3) on btrfs will return the first file, ext4 and xfs return the 2
      files (but in a different order). A test program for this, reported by
      Ian Johnson, is the following:
      
         #include <dirent.h>
         #include <stdio.h>
      
         int main(void) {
           DIR *dir = opendir("test");
      
           FILE *file;
           file = fopen("test/1", "w");
           fwrite("1", 1, 1, file);
           fclose(file);
      
           file = fopen("test/2", "w");
           fwrite("2", 1, 1, file);
           fclose(file);
      
           struct dirent *entry;
           while ((entry = readdir(dir))) {
              printf("%s\n", entry->d_name);
           }
           closedir(dir);
           return 0;
         }
      
      To make this less odd, change the behaviour to never return new entries
      that were added after the opendir(3) call. This is done by setting the
      last_index field of the struct btrfs_file_private attached to the
      directory's file handle with a value matching btrfs_inode::index_cnt
      minus 1, since that value always matches the index of the next new
      directory entry and not the index of the most recently added entry.
      
      [1] https://pubs.opengroup.org/onlinepubs/007904875/functions/readdir_r.html
      
      Link: https://lore.kernel.org/linux-btrfs/YR1P0S.NGASEG570GJ8@ianjohnson.dev/
      CC: stable@vger.kernel.org # 6.5+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a045b6b1
    • Filipe Manana's avatar
      btrfs: fix infinite directory reads · 2aa515b5
      Filipe Manana authored
      commit 9b378f6a
      
       upstream.
      
      The readdir implementation currently processes always up to the last index
      it finds. This however can result in an infinite loop if the directory has
      a large number of entries such that they won't all fit in the given buffer
      passed to the readdir callback, that is, dir_emit() returns a non-zero
      value. Because in that case readdir() will be called again and if in the
      meanwhile new directory entries were added and we still can't put all the
      remaining entries in the buffer, we keep repeating this over and over.
      
      The following C program and test script reproduce the problem:
      
        $ cat /mnt/readdir_prog.c
        #include <sys/types.h>
        #include <dirent.h>
        #include <stdio.h>
      
        int main(int argc, char *argv[])
        {
          DIR *dir = opendir(".");
          struct dirent *dd;
      
          while ((dd = readdir(dir))) {
            printf("%s\n", dd->d_name);
            rename(dd->d_name, "TEMPFILE");
            rename("TEMPFILE", dd->d_name);
          }
          closedir(dir);
        }
      
        $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
      
        $ cat test.sh
        #!/bin/bash
      
        DEV=/dev/sdi
        MNT=/mnt/sdi
      
        mkfs.btrfs -f $DEV &> /dev/null
        #mkfs.xfs -f $DEV &> /dev/null
        #mkfs.ext4 -F $DEV &> /dev/null
      
        mount $DEV $MNT
      
        mkdir $MNT/testdir
        for ((i = 1; i <= 2000; i++)); do
            echo -n > $MNT/testdir/file_$i
        done
      
        cd $MNT/testdir
        /mnt/readdir_prog
      
        cd /mnt
      
        umount $MNT
      
      This behaviour is surprising to applications and it's unlike ext4, xfs,
      tmpfs, vfat and other filesystems, which always finish. In this case where
      new entries were added due to renames, some file names may be reported
      more than once, but this varies according to each filesystem - for example
      ext4 never reported the same file more than once while xfs reports the
      first 13 file names twice.
      
      So change our readdir implementation to track the last index number when
      opendir() is called and then make readdir() never process beyond that
      index number. This gives the same behaviour as ext4.
      
      Reported-by: default avatarRob Landley <rob@landley.net>
      Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
      CC: stable@vger.kernel.org # 5.15
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2aa515b5
    • Florian Westphal's avatar
      netfilter: nft_limit: reject configurations that cause integer overflow · bc6e242b
      Florian Westphal authored
      [ Upstream commit c9d9eb9c ]
      
      Reject bogus configs where internal token counter wraps around.
      This only occurs with very very large requests, such as 17gbyte/s.
      
      Its better to reject this rather than having incorrect ratelimit.
      
      Fixes: d2168e84
      
       ("netfilter: nft_limit: add per-byte limiting")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bc6e242b
    • Frederic Weisbecker's avatar
      rcu: Defer RCU kthreads wakeup when CPU is dying · c817f5c0
      Frederic Weisbecker authored
      [ Upstream commit e787644c
      
       ]
      
      When the CPU goes idle for the last time during the CPU down hotplug
      process, RCU reports a final quiescent state for the current CPU. If
      this quiescent state propagates up to the top, some tasks may then be
      woken up to complete the grace period: the main grace period kthread
      and/or the expedited main workqueue (or kworker).
      
      If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
      arm the RT bandwith timer to the local offline CPU. Since this happens
      after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
      timer gets ignored. Therefore if the RCU kthreads are waiting for RT
      bandwidth to be available, they may never be actually scheduled.
      
      This triggers TREE03 rcutorture hangs:
      
      	 rcu: INFO: rcu_preempt self-detected stall on CPU
      	 rcu:     4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
      	 rcu:     (t=21035 jiffies g=938281 q=40787 ncpus=6)
      	 rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
      	 rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
      	 rcu: RCU grace-period kthread stack dump:
      	 task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
      	 Call Trace:
      	  <TASK>
      	  __schedule+0x2eb/0xa80
      	  schedule+0x1f/0x90
      	  schedule_timeout+0x163/0x270
      	  ? __pfx_process_timeout+0x10/0x10
      	  rcu_gp_fqs_loop+0x37c/0x5b0
      	  ? __pfx_rcu_gp_kthread+0x10/0x10
      	  rcu_gp_kthread+0x17c/0x200
      	  kthread+0xde/0x110
      	  ? __pfx_kthread+0x10/0x10
      	  ret_from_fork+0x2b/0x40
      	  ? __pfx_kthread+0x10/0x10
      	  ret_from_fork_asm+0x1b/0x30
      	  </TASK>
      
      The situation can't be solved with just unpinning the timer. The hrtimer
      infrastructure and the nohz heuristics involved in finding the best
      remote target for an unpinned timer would then also need to handle
      enqueues from an offline CPU in the most horrendous way.
      
      So fix this on the RCU side instead and defer the wake up to an online
      CPU if it's too late for the local one.
      
      Reported-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Fixes: 5c0930cc
      
       ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c817f5c0
    • Dinghao Liu's avatar
      net/mlx5e: fix a potential double-free in fs_any_create_groups · b2fa86b2
      Dinghao Liu authored
      [ Upstream commit aef855df ]
      
      When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
      fs_any_create_groups() will free ft->g. However, its caller
      fs_any_create_table() will free ft->g again through calling
      mlx5e_destroy_flow_table(), which will lead to a double-free.
      Fix this by setting ft->g to NULL in fs_any_create_groups().
      
      Fixes: 0f575c20
      
       ("net/mlx5e: Introduce Flow Steering ANY API")
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b2fa86b2
    • Zhipeng Lu's avatar
      net/mlx5e: fix a double-free in arfs_create_groups · 42876db0
      Zhipeng Lu authored
      [ Upstream commit 3c6d5189 ]
      
      When `in` allocated by kvzalloc fails, arfs_create_groups will free
      ft->g and return an error. However, arfs_create_table, the only caller of
      arfs_create_groups, will hold this error and call to
      mlx5e_destroy_flow_table, in which the ft->g will be freed again.
      
      Fixes: 1cabe6b0
      
       ("net/mlx5e: Create aRFS flow tables")
      Signed-off-by: default avatarZhipeng Lu <alexious@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      42876db0
    • Leon Romanovsky's avatar
      net/mlx5e: Allow software parsing when IPsec crypto is enabled · 890881d1
      Leon Romanovsky authored
      [ Upstream commit 20f5468a ]
      
      All ConnectX devices have software parsing capability enabled, but it is
      more correct to set allow_swp only if capability exists, which for IPsec
      means that crypto offload is supported.
      
      Fixes: 2451da08
      
       ("net/mlx5: Unify device IPsec capabilities check")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      890881d1
    • Rahul Rameshbabu's avatar
      net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO · 62ce1600
      Rahul Rameshbabu authored
      [ Upstream commit 20cbf8cb ]
      
      mlx5 devices have specific constants for choosing the CQ period mode. These
      constants do not have to match the constants used by the kernel software
      API for DIM period mode selection.
      
      Fixes: cdd04f4d
      
       ("net/mlx5: Add support to create SQ and CQ for ASO")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62ce1600
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Can't go to uplink vport on RX rule · 75d9ed49
      Yevgeny Kliteynik authored
      [ Upstream commit 5b2a2523 ]
      
      Go-To-Vport action on RX is not allowed when the vport is uplink.
      In such case, the packet should be dropped.
      
      Fixes: 9db810ed
      
       ("net/mlx5: DR, Expose steering action functionality")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarErez Shitrit <erezsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      75d9ed49
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Use the right GVMI number for drop action · e54aedd4
      Yevgeny Kliteynik authored
      [ Upstream commit 56659542 ]
      
      When FW provides ICM addresses for drop RX/TX, the provided capability
      is 64 bits that contain its GVMI as well as the ICM address itself.
      In case of TX DROP this GVMI is different from the GVMI that the
      domain is operating on.
      
      This patch fixes the action to use these GVMI IDs, as provided by FW.
      
      Fixes: 9db810ed
      
       ("net/mlx5: DR, Expose steering action functionality")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e54aedd4