Skip to content
  1. Oct 05, 2022
  2. Sep 28, 2022
    • Greg Kroah-Hartman's avatar
      Linux 5.15.71 · 90c7e9b4
      Greg Kroah-Hartman authored
      
      
      Link: https://lore.kernel.org/r/20220926100756.074519146@linuxfoundation.org
      Tested-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Link: https://lore.kernel.org/r/20220926163551.791017156@linuxfoundation.org
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Tested-by: default avatarSudip Mukherjee <sudip.mukherjee@codethink.co.uk>
      Tested-by: default avatarRon Economos <re@w6rz.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarKelsey Steele <kelseysteele@linux.microsoft.com>
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      v5.15.71
      90c7e9b4
    • Jan Kara's avatar
      ext4: use locality group preallocation for small closed files · 21419461
      Jan Kara authored
      commit a9f2a293 upstream.
      
      Curently we don't use any preallocation when a file is already closed
      when allocating blocks (from writeback code when converting delayed
      allocation). However for small files, using locality group preallocation
      is actually desirable as that is not specific to a particular file.
      Rather it is a method to pack small files together to reduce
      fragmentation and for that the fact the file is closed is actually even
      stronger hint the file would benefit from packing. So change the logic
      to allow locality group preallocation in this case.
      
      Fixes: 196e402a
      
       ("ext4: improve cr 0 / cr 1 group scanning")
      CC: stable@kernel.org
      Reported-and-tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Link: https://lore.kernel.org/r/20220908092136.11770-4-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21419461
    • Jan Kara's avatar
      ext4: avoid unnecessary spreading of allocations among groups · 8a1ac416
      Jan Kara authored
      commit 1940265e upstream.
      
      mb_set_largest_free_order() updates lists containing groups with largest
      chunk of free space of given order. The way it updates it leads to
      always moving the group to the tail of the list. Thus allocations
      looking for free space of given order effectively end up cycling through
      all groups (and due to initialization in last to first order). This
      spreads allocations among block groups which reduces performance for
      rotating disks or low-end flash media. Change
      mb_set_largest_free_order() to only update lists if the order of the
      largest free chunk in the group changed.
      
      Fixes: 196e402a
      
       ("ext4: improve cr 0 / cr 1 group scanning")
      CC: stable@kernel.org
      Reported-and-tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Link: https://lore.kernel.org/r/20220908092136.11770-2-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a1ac416
    • Jan Kara's avatar
      ext4: make mballoc try target group first even with mb_optimize_scan · fd8b8291
      Jan Kara authored
      commit 4fca50d4 upstream.
      
      One of the side-effects of mb_optimize_scan was that the optimized
      functions to select next group to try were called even before we tried
      the goal group. As a result we no longer allocate files close to
      corresponding inodes as well as we don't try to expand currently
      allocated extent in the same group. This results in reaim regression
      with workfile.disk workload of upto 8% with many clients on my test
      machine:
      
                           baseline               mb_optimize_scan
      Hmean     disk-1       2114.16 (   0.00%)     2099.37 (  -0.70%)
      Hmean     disk-41     87794.43 (   0.00%)    83787.47 *  -4.56%*
      Hmean     disk-81    148170.73 (   0.00%)   135527.05 *  -8.53%*
      Hmean     disk-121   177506.11 (   0.00%)   166284.93 *  -6.32%*
      Hmean     disk-161   220951.51 (   0.00%)   207563.39 *  -6.06%*
      Hmean     disk-201   208722.74 (   0.00%)   203235.59 (  -2.63%)
      Hmean     disk-241   222051.60 (   0.00%)   217705.51 (  -1.96%)
      Hmean     disk-281   252244.17 (   0.00%)   241132.72 *  -4.41%*
      Hmean     disk-321   255844.84 (   0.00%)   245412.84 *  -4.08%*
      
      Also this is causing huge regression (time increased by a factor of 5 or
      so) when untarring archive with lots of small files on some eMMC storage
      cards.
      
      Fix the problem by making sure we try goal group first.
      
      Fixes: 196e402a
      
       ("ext4: improve cr 0 / cr 1 group scanning")
      CC: stable@kernel.org
      Reported-and-tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Link: https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220908092136.11770-1-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd8b8291
    • Theodore Ts'o's avatar
      ext4: limit the number of retries after discarding preallocations blocks · 21dada4c
      Theodore Ts'o authored
      commit 80fa46d6
      
       upstream.
      
      This patch avoids threads live-locking for hours when a large number
      threads are competing over the last few free extents as they blocks
      getting added and removed from preallocation pools.  From our bug
      reporter:
      
         A reliable way for triggering this has multiple writers
         continuously write() to files when the filesystem is full, while
         small amounts of space are freed (e.g. by truncating a large file
         -1MiB at a time). In the local filesystem, this can be done by
         simply not checking the return code of write (0) and/or the error
         (ENOSPACE) that is set. Over NFS with an async mount, even clients
         with proper error checking will behave this way since the linux NFS
         client implementation will not propagate the server errors [the
         write syscalls immediately return success] until the file handle is
         closed. This leads to a situation where NFS clients send a
         continuous stream of WRITE rpcs which result in ERRNOSPACE -- but
         since the client isn't seeing this, the stream of writes continues
         at maximum network speed.
      
         When some space does appear, multiple writers will all attempt to
         claim it for their current write. For NFS, we may see dozens to
         hundreds of threads that do this.
      
         The real-world scenario of this is database backup tooling (in
         particular, github.com/mdkent/percona-xtrabackup) which may write
         large files (>1TiB) to NFS for safe keeping. Some temporary files
         are written, rewound, and read back -- all before closing the file
         handle (the temp file is actually unlinked, to trigger automatic
         deletion on close/crash.) An application like this operating on an
         async NFS mount will not see an error code until TiB have been
         written/read.
      
         The lockup was observed when running this database backup on large
         filesystems (64 TiB in this case) with a high number of block
         groups and no free space. Fragmentation is generally not a factor
         in this filesystem (~thousands of large files, mostly contiguous
         except for the parts written while the filesystem is at capacity.)
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21dada4c
    • Luís Henriques's avatar
      ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 · be4df018
      Luís Henriques authored
      commit 29a5b8a1
      
       upstream.
      
      When walking through an inode extents, the ext4_ext_binsearch_idx() function
      assumes that the extent header has been previously validated.  However, there
      are no checks that verify that the number of entries (eh->eh_entries) is
      non-zero when depth is > 0.  And this will lead to problems because the
      EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:
      
      [  135.245946] ------------[ cut here ]------------
      [  135.247579] kernel BUG at fs/ext4/extents.c:2258!
      [  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
      [  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
      [  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
      [  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
      [  135.256475] Code:
      [  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
      [  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
      [  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
      [  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
      [  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
      [  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
      [  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
      [  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
      [  135.277952] Call Trace:
      [  135.278635]  <TASK>
      [  135.279247]  ? preempt_count_add+0x6d/0xa0
      [  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
      [  135.281612]  ? _raw_read_unlock+0x18/0x30
      [  135.282704]  ext4_map_blocks+0x294/0x5a0
      [  135.283745]  ? xa_load+0x6f/0xa0
      [  135.284562]  ext4_mpage_readpages+0x3d6/0x770
      [  135.285646]  read_pages+0x67/0x1d0
      [  135.286492]  ? folio_add_lru+0x51/0x80
      [  135.287441]  page_cache_ra_unbounded+0x124/0x170
      [  135.288510]  filemap_get_pages+0x23d/0x5a0
      [  135.289457]  ? path_openat+0xa72/0xdd0
      [  135.290332]  filemap_read+0xbf/0x300
      [  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
      [  135.292192]  new_sync_read+0x103/0x170
      [  135.293014]  vfs_read+0x15d/0x180
      [  135.293745]  ksys_read+0xa1/0xe0
      [  135.294461]  do_syscall_64+0x3c/0x80
      [  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This patch simply adds an extra check in __ext4_ext_check(), verifying that
      eh_entries is not 0 when eh_depth is > 0.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
      Cc: Baokun Li <libaokun1@huawei.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLuís Henriques <lhenriques@suse.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBaokun Li <libaokun1@huawei.com>
      Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.de
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be4df018
    • Jan Kara's avatar
      ext4: make directory inode spreading reflect flexbg size · 90bc7b63
      Jan Kara authored
      commit 613c5a85
      
       upstream.
      
      Currently the Orlov inode allocator searches for free inodes for a
      directory only in flex block groups with at most inodes_per_group/16
      more directory inodes than average per flex block group. However with
      growing size of flex block group this becomes unnecessarily strict.
      Scale allowed difference from average directory count per flex block
      group with flex block group size as we do with other metrics.
      
      Tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220908092136.11770-3-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90bc7b63
    • Dan Williams's avatar
      devdax: Fix soft-reservation memory description · 95d714d8
      Dan Williams authored
      commit 67feaba4 upstream.
      
      The "hmem" platform-devices that are created to represent the
      platform-advertised "Soft Reserved" memory ranges end up inserting a
      resource that causes the iomem_resource tree to look like this:
      
      340000000-43fffffff : hmem.0
        340000000-43fffffff : Soft Reserved
          340000000-43fffffff : dax0.0
      
      This is because insert_resource() reparents ranges when they completely
      intersect an existing range.
      
      This matters because code that uses region_intersects() to scan for a
      given IORES_DESC will only check that top-level 'hmem.0' resource and
      not the 'Soft Reserved' descendant.
      
      So, to support EINJ (via einj_error_inject()) to inject errors into
      memory hosted by a dax-device, be sure to describe the memory as
      IORES_DESC_SOFT_RESERVED. This is a follow-on to:
      
      commit b13a3e5f ("ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP")
      
      ...that fixed EINJ support for "Soft Reserved" ranges in the first
      instance.
      
      Fixes: 262b45ae
      
       ("x86/efi: EFI soft reservation to E820 enumeration")
      Reported-by: default avatarRicardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
      Tested-by: default avatarRicardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Omar Avelar <omar.avelar@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mark Gross <markgross@kernel.org>
      Link: https://lore.kernel.org/r/166397075670.389916.7435722208896316387.stgit@dwillia2-xfh.jf.intel.com
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95d714d8
    • Trond Myklebust's avatar
      NFSv4: Fixes for nfs4_inode_return_delegation() · 27bf7a5d
      Trond Myklebust authored
      commit 6e176d47
      
       upstream.
      
      We mustn't call nfs_wb_all() on anything other than a regular file.
      Furthermore, we can exit early when we don't hold a delegation.
      
      Reported-by: default avatarDavid Wysochanski <dwysocha@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27bf7a5d
    • Alex Deucher's avatar
      drm/amdgpu: don't register a dirty callback for non-atomic · 21b0301f
      Alex Deucher authored
      [ Upstream commit abbc7a3d ]
      
      Some asics still support non-atomic code paths.
      
      Fixes: 66f99628
      
       ("drm/amdgpu: use dirty framebuffer helper")
      Reported-by: default avatarArthur Marsh <arthur.marsh@internode.on.net>
      Reviewed-by: default avatarHamza Mahfooz <hamza.mahfooz@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      21b0301f
    • Asmaa Mnebhi's avatar
      i2c: mlxbf: Fix frequency calculation · 6eb08245
      Asmaa Mnebhi authored
      [ Upstream commit 37f071ec ]
      
      The i2c-mlxbf.c driver is currently broken because there is a bug
      in the calculation of the frequency. core_f, core_r and core_od
      are components read from hardware registers and are used to
      compute the frequency used to compute different timing parameters.
      The shifting mechanism used to get core_f, core_r and core_od is
      wrong. Use FIELD_GET to mask and shift the bitfields properly.
      
      Fixes: b5b5b320
      
       (i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC)
      Reviewed-by: default avatarKhalil Blaiech <kblaiech@nvidia.com>
      Signed-off-by: default avatarAsmaa Mnebhi <asmaa@nvidia.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6eb08245
    • Asmaa Mnebhi's avatar
      i2c: mlxbf: prevent stack overflow in mlxbf_i2c_smbus_start_transaction() · dc2a0c58
      Asmaa Mnebhi authored
      [ Upstream commit de24aceb ]
      
      memcpy() is called in a loop while 'operation->length' upper bound
      is not checked and 'data_idx' also increments.
      
      Fixes: b5b5b320
      
       ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
      Reviewed-by: default avatarKhalil Blaiech <kblaiech@nvidia.com>
      Signed-off-by: default avatarAsmaa Mnebhi <asmaa@nvidia.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc2a0c58
    • Asmaa Mnebhi's avatar
      i2c: mlxbf: incorrect base address passed during io write · 621c6ab0
      Asmaa Mnebhi authored
      [ Upstream commit 2a5be6d1 ]
      
      Correct the base address used during io write.
      This bug had no impact over the overall functionality of the read and write
      transactions. MLXBF_I2C_CAUSE_OR_CLEAR=0x18 so writing to (smbus->io + 0x18)
      instead of (mst_cause->ioi + 0x18) actually writes to the sc_low_timeout
      register which just sets the timeout value before a read/write aborts.
      
      Fixes: b5b5b320
      
       (i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC)
      Reviewed-by: default avatarKhalil Blaiech <kblaiech@nvidia.com>
      Signed-off-by: default avatarAsmaa Mnebhi <asmaa@nvidia.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      621c6ab0
    • Uwe Kleine-König's avatar
      i2c: imx: If pm_runtime_get_sync() returned 1 device access is possible · c242dbf2
      Uwe Kleine-König authored
      [ Upstream commit 085aacaa
      
       ]
      
      pm_runtime_get_sync() returning 1 also means the device is powered. So
      resetting the chip registers in .remove() is possible and should be
      done.
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: d98bdd3a
      
       ("i2c: imx: Make sure to unregister adapter on remove()")
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c242dbf2
    • Tetsuo Handa's avatar
      workqueue: don't skip lockdep work dependency in cancel_work_sync() · c71ec39b
      Tetsuo Handa authored
      [ Upstream commit c0feea59 ]
      
      Like Hillf Danton mentioned
      
        syzbot should have been able to catch cancel_work_sync() in work context
        by checking lockdep_map in __flush_work() for both flush and cancel.
      
      in [1], being unable to report an obvious deadlock scenario shown below is
      broken. From locking dependency perspective, sync version of cancel request
      should behave as if flush request, for it waits for completion of work if
      that work has already started execution.
      
        ----------
        #include <linux/module.h>
        #include <linux/sched.h>
        static DEFINE_MUTEX(mutex);
        static void work_fn(struct work_struct *work)
        {
          schedule_timeout_uninterruptible(HZ / 5);
          mutex_lock(&mutex);
          mutex_unlock(&mutex);
        }
        static DECLARE_WORK(work, work_fn);
        static int __init test_init(void)
        {
          schedule_work(&work);
          schedule_timeout_uninterruptible(HZ / 10);
          mutex_lock(&mutex);
          cancel_work_sync(&work);
          mutex_unlock(&mutex);
          return -EINVAL;
        }
        module_init(test_init);
        MODULE_LICENSE("GPL");
        ----------
      
      The check this patch restores was added by commit 0976dfc1
      ("workqueue: Catch more locking problems with flush_work()").
      
      Then, lockdep's crossrelease feature was added by commit b09be676
      ("locking/lockdep: Implement the 'crossrelease' feature"). As a result,
      this check was once removed by commit fd1a5b04 ("workqueue: Remove
      now redundant lock acquisitions wrt. workqueue flushes").
      
      But lockdep's crossrelease feature was removed by commit e966eaee
      ("locking/lockdep: Remove the cross-release locking checks"). At this
      point, this check should have been restored.
      
      Then, commit d6e89786 ("workqueue: skip lockdep wq dependency in
      cancel_work_sync()") introduced a boolean flag in order to distinguish
      flush_work() and cancel_work_sync(), for checking "struct workqueue_struct"
      dependency when called from cancel_work_sync() was causing false positives.
      
      Then, commit 87915adc
      
       ("workqueue: re-add lockdep dependencies for
      flushing") tried to restore "struct work_struct" dependency check, but by
      error checked this boolean flag. Like an example shown above indicates,
      "struct work_struct" dependency needs to be checked for both flush_work()
      and cancel_work_sync().
      
      Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1]
      Reported-by: default avatarHillf Danton <hdanton@sina.com>
      Suggested-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
      Fixes: 87915adc
      
       ("workqueue: re-add lockdep dependencies for flushing")
      Cc: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c71ec39b
    • Li Jinlin's avatar
      fsdax: Fix infinite loop in dax_iomap_rw() · 929ef155
      Li Jinlin authored
      [ Upstream commit 17d9c15c ]
      
      I got an infinite loop and a WARNING report when executing a tail command
      in virtiofs.
      
        WARNING: CPU: 10 PID: 964 at fs/iomap/iter.c:34 iomap_iter+0x3a2/0x3d0
        Modules linked in:
        CPU: 10 PID: 964 Comm: tail Not tainted 5.19.0-rc7
        Call Trace:
        <TASK>
        dax_iomap_rw+0xea/0x620
        ? __this_cpu_preempt_check+0x13/0x20
        fuse_dax_read_iter+0x47/0x80
        fuse_file_read_iter+0xae/0xd0
        new_sync_read+0xfe/0x180
        ? 0xffffffff81000000
        vfs_read+0x14d/0x1a0
        ksys_read+0x6d/0xf0
        __x64_sys_read+0x1a/0x20
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The tail command will call read() with a count of 0. In this case,
      iomap_iter() will report this WARNING, and always return 1 which casuing
      the infinite loop in dax_iomap_rw().
      
      Fixing by checking count whether is 0 in dax_iomap_rw().
      
      Fixes: ca289e0b
      
       ("fsdax: switch dax_iomap_rw to use iomap_iter")
      Signed-off-by: default avatarLi Jinlin <lijinlin3@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/r/20220725032050.3873372-1-lijinlin3@huawei.com
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      929ef155