Skip to content
  1. Jun 21, 2023
  2. Jun 20, 2023
    • Yu Kuai's avatar
      reiserfs: fix blkdev_put() warning from release_journal_dev() · c576c4bf
      Yu Kuai authored
      
      
      In journal_init_dev(), if super bdev is used as 'j_dev_bd', then
      blkdev_get_by_dev() is called with NULL holder, otherwise, holder will
      be journal. However, later in release_journal_dev(), blkdev_put() is
      called with journal unconditionally, cause following warning:
      
      WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 bd_end_claim block/bdev.c:617 [inline]
      WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 blkdev_put+0x562/0x8a0 block/bdev.c:901
      RIP: 0010:blkdev_put+0x562/0x8a0 block/bdev.c:901
      Call Trace:
       <TASK>
       release_journal_dev fs/reiserfs/journal.c:2592 [inline]
       free_journal_ram+0x421/0x5c0 fs/reiserfs/journal.c:1896
       do_journal_release fs/reiserfs/journal.c:1960 [inline]
       journal_release+0x276/0x630 fs/reiserfs/journal.c:1971
       reiserfs_put_super+0xe4/0x5c0 fs/reiserfs/super.c:616
       generic_shutdown_super+0x158/0x480 fs/super.c:499
       kill_block_super+0x64/0xb0 fs/super.c:1422
       deactivate_locked_super+0x98/0x160 fs/super.c:330
       deactivate_super+0xb1/0xd0 fs/super.c:361
       cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1247
       task_work_run+0x16f/0x270 kernel/task_work.c:179
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0xadc/0x2a30 kernel/exit.c:874
       do_group_exit+0xd4/0x2a0 kernel/exit.c:1024
       __do_sys_exit_group kernel/exit.c:1035 [inline]
       __se_sys_exit_group kernel/exit.c:1033 [inline]
       __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fix this problem by passing in NULL holder in this case.
      
      Reported-by: default avatar <syzbot+04625c80899f4555de39@syzkaller.appspotmail.com>
      Link: https://syzkaller.appspot.com/bug?extid=04625c80899f4555de39
      Fixes: 2736e8ee
      
       ("block: use the holder as indication for exclusive opens")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230620111322.1014775-1-yukuai1@huaweicloud.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c576c4bf
    • Yu Kuai's avatar
      block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions() · 985958b8
      Yu Kuai authored
      After commit 2736e8ee
      
       ("block: use the holder as indication for
      exclusive opens"), blkdev_get_by_dev() will warn if holder is NULL and
      mode contains 'FMODE_EXCL'.
      
      holder from blkdev_get_by_dev() from disk_scan_partitions() is always NULL,
      hence it should not use 'FMODE_EXCL', which is broben by the commit. For
      consequence, WARN_ON_ONCE() will be triggered from blkdev_get_by_dev()
      if user scan partitions with device opened exclusively.
      
      Fix this problem by removing 'FMODE_EXCL' from disk_scan_partitions(),
      as it used to be.
      
      Reported-by: default avatar <syzbot+00cd27751f78817f167b@syzkaller.appspotmail.com>
      Link: https://syzkaller.appspot.com/bug?extid=00cd27751f78817f167b
      Fixes: 2736e8ee
      
       ("block: use the holder as indication for exclusive opens")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230618140402.7556-1-yukuai1@huaweicloud.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      985958b8
    • Christoph Hellwig's avatar
      block: document the holder argument to blkdev_get_by_path · e89e001f
      Christoph Hellwig authored
      
      
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230620043536.707249-1-hch@lst.de
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e89e001f
    • Demi Marie Obenour's avatar
      block: increment diskseq on all media change events · b90ecc03
      Demi Marie Obenour authored
      
      
      Currently, associating a loop device with a different file descriptor
      does not increment its diskseq.  This allows the following race
      condition:
      
      1. Program X opens a loop device
      2. Program X gets the diskseq of the loop device.
      3. Program X associates a file with the loop device.
      4. Program X passes the loop device major, minor, and diskseq to
         something.
      5. Program X exits.
      6. Program Y detaches the file from the loop device.
      7. Program Y attaches a different file to the loop device.
      8. The opener finally gets around to opening the loop device and checks
         that the diskseq is what it expects it to be.  Even though the
         diskseq is the expected value, the result is that the opener is
         accessing the wrong file.
      
      From discussions with Christoph Hellwig, it appears that
      disk_force_media_change() was supposed to call inc_diskseq(), but in
      fact it does not.  Adding a Fixes: tag to indicate this.  Christoph's
      Reported-by is because he stated that disk_force_media_change()
      calls inc_diskseq(), which is what led me to discover that it should but
      does not.
      
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarDemi Marie Obenour <demi@invisiblethingslab.com>
      Fixes: e6138dc1
      
       ("block: add a helper to raise a media changed event")
      Cc: stable@vger.kernel.org # 5.15+
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b90ecc03
    • Christoph Hellwig's avatar
      swim: fix a missing FMODE_ -> BLK_OPEN_ conversion in floppy_open · 9a7933f3
      Christoph Hellwig authored
      Fix a missing conversion to the new BLK_OPEN constant in swim.
      
      Fixes: 05bdb996
      
       ("block: replace fmode_t with a block-specific type for block open flags")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230620043051.707196-1-hch@lst.de
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9a7933f3
  3. Jun 17, 2023
  4. Jun 16, 2023
    • Jens Axboe's avatar
      Merge tag 'nvme-6.5-2023-06-16' of git://git.infradead.org/nvme into for-6.5/block · 236f2552
      Jens Axboe authored
      Pull NVMe updates from Keith:
      
      "nvme updates for Linux 6.5
      
       - Various cleanups all around (Irvin, Chaitanya, Christophe)
       - Better struct packing (Christophe JAILLET)
       - Reduce controller error logs for optional commands (Keith)
       - Support for >=64KiB block sizes (Daniel Gomez)
       - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner)"
      
      * tag 'nvme-6.5-2023-06-16' of git://git.infradead.org/nvme: (27 commits)
        nvme: forward port sysfs delete fix
        nvme: skip optional id ctrl csi if it failed
        nvme-core: use nvme_ns_head_multipath instead of ns->head->disk
        nvmet-fcloop: Do not wait on completion when unregister fails
        nvme-fabrics: open code __nvmf_host_find()
        nvme-fabrics: error out to unlock the mutex
        nvme: Increase block size variable size to 32-bit
        nvme-fcloop: no need to return from void function
        nvmet-auth: remove unnecessary break after goto
        nvmet-auth: remove some dead code
        nvme-core: remove redundant check from nvme_init_ns_head
        nvme: move sysfs code to a dedicated sysfs.c file
        nvme-fabrics: prevent overriding of existing host
        nvme-fabrics: check hostid using uuid_equal
        nvme-fabrics: unify common code in admin and io queue connect
        nvmet: reorder fields in 'struct nvmefc_fcp_req'
        nvmet: reorder fields in 'struct nvme_dhchap_queue_context'
        nvmet: reorder fields in 'struct nvmf_ctrl_options'
        nvme: reorder fields in 'struct nvme_ctrl'
        nvmet: reorder fields in 'struct nvmet_sq'
        ...
      236f2552
    • Keith Busch's avatar
      nvme: forward port sysfs delete fix · 1c606f7f
      Keith Busch authored
      We had a late fix that modified nvme_sysfs_delete() after the staging
      branch for the next merge window relocated the function to a new file.
      Port commit 2eb94dd5
      
       ("nvme: do not let the user delete a ctrl
      before a complete") to the latest to avoid a potentially confusing merge
      conflict.
      
      Cc: Maurizio Lombardi <mlombard@redhat.com>
      Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      1c606f7f
  5. Jun 15, 2023
  6. Jun 14, 2023
    • Jens Axboe's avatar
      Merge tag 'md-next-20230613' of... · 60701311
      Jens Axboe authored
      Merge tag 'md-next-20230613' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.5/block
      
      Pull MD updates from Song:
      
      "The major changes are:
      
       1. Protect md_thread with rcu, by Yu Kuai;
       2. Various non-urgent raid5 and raid1/10 fixes, by Yu Kuai;
       3. Non-urgent raid10 fixes, by Li Nan."
      
      * tag 'md-next-20230613' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: (29 commits)
        md/raid1-10: limit the number of plugged bio
        md/raid1-10: don't handle pluged bio by daemon thread
        md/md-bitmap: add a new helper to unplug bitmap asynchrously
        md/raid1-10: submit write io directly if bitmap is not enabled
        md/raid1-10: factor out a helper to submit normal write
        md/raid1-10: factor out a helper to add bio to plug
        md/raid10: prevent soft lockup while flush writes
        md/raid10: fix io loss while replacement replace rdev
        md/raid10: Do not add spare disk when recovery fails
        md/raid10: clean up md_add_new_disk()
        md/raid10: prioritize adding disk to 'removed' mirror
        md/raid10: improve code of mrdev in raid10_sync_request
        md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
        md/raid5: don't start reshape when recovery or replace is in progress
        md: protect md_thread with rcu
        md/bitmap: factor out a helper to set timeout
        md/bitmap: always wake up md_thread in timeout_store
        dm-raid: remove useless checking in raid_message()
        md: factor out a helper to wake up md_thread directly
        md: fix duplicate filename for rdev
        ...
      60701311
    • David Howells's avatar
      block: Fix dio_cleanup() to advance the head index · d44c4042
      David Howells authored
      Fix dio_bio_cleanup() to advance the head index into the list of pages past
      the pages it has released, as __blockdev_direct_IO() will call it twice if
      do_direct_IO() fails.
      
      The issue was causing:
      
              WARNING: CPU: 6 PID: 2220 at mm/gup.c:76 try_get_folio
      
      This can be triggered by setting up a clean pair of UDF filesystems on
      loopback devices and running the generic/451 xfstest with them as the
      scratch and test partitions.  Something like the following:
      
          fallocate /mnt2/udf_scratch -l 1G
          fallocate /mnt2/udf_test -l 1G
          mknod /dev/lo0 b 7 0
          mknod /dev/lo1 b 7 1
          losetup lo0 /mnt2/udf_scratch
          losetup lo1 /mnt2/udf_test
          mkfs -t udf /dev/lo0
          mkfs -t udf /dev/lo1
          cd xfstests
          ./check generic/451
      
      with xfstests configured by putting the following into local.config:
      
          export FSTYP=udf
          export DISABLE_UDF_TEST=1
          export TEST_DEV=/dev/lo1
          export TEST_DIR=/xfstest.test
          export SCRATCH_DEV=/dev/lo0
          export SCRATCH_MNT=/xfstest.scratch
      
      Fixes: 1ccf164e
      
       ("block: Use iov_iter_extract_pages() and page pinning in direct-io.c")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202306120931.a9606b88-oliver.sang@intel.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Christoph Hellwig <hch@infradead.org>
      cc: David Hildenbrand <david@redhat.com>
      cc: Andrew Morton <akpm@linux-foundation.org>
      cc: Jens Axboe <axboe@kernel.dk>
      cc: Al Viro <viro@zeniv.linux.org.uk>
      cc: Matthew Wilcox <willy@infradead.org>
      cc: Jan Kara <jack@suse.cz>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: Jason Gunthorpe <jgg@nvidia.com>
      cc: Logan Gunthorpe <logang@deltatee.com>
      cc: Hillf Danton <hdanton@sina.com>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Linus Torvalds <torvalds@linux-foundation.org>
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-block@vger.kernel.org
      cc: linux-kernel@vger.kernel.org
      cc: linux-mm@kvack.org
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/1193485.1686693279@warthog.procyon.org.uk
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d44c4042
    • Yu Kuai's avatar
      md/raid1-10: limit the number of plugged bio · 460af1f9
      Yu Kuai authored
      
      
      bio can be added to plug infinitely, and following writeback test can
      trigger huge amount of plugged bio:
      
      Test script:
      modprobe brd rd_nr=4 rd_size=10485760
      mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean --bitmap=internal
      echo 0 > /proc/sys/vm/dirty_background_ratio
      fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 -iodepth=128 -name=test
      
      Test result:
      Monitor /sys/block/md0/inflight will found that inflight keep increasing
      until fio finish writing, after running for about 2 minutes:
      
      [root@fedora ~]# cat /sys/block/md0/inflight
             0  4474191
      
      Fix the problem by limiting the number of plugged bio based on the number
      of copies for original bio.
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529131106.2123367-8-yukuai1@huaweicloud.com
      460af1f9
    • Yu Kuai's avatar
      md/raid1-10: don't handle pluged bio by daemon thread · 9efcc2c3
      Yu Kuai authored
      current->bio_list will be set under submit_bio() context, in this case
      bitmap io will be added to the list and wait for current io submission to
      finish, while current io submission must wait for bitmap io to be done.
      commit 874807a8 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
      the deadlock by handling plugged bio by daemon thread.
      
      On the one hand, the deadlock won't exist after commit a214b949
      ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
      the other hand, current solution makes it impossible to flush plugged bio
      in raid1/10_make_request(), because this will cause that all the writes
      will goto daemon thread.
      
      In order to limit the number of plugged bio, commit 874807a8
      
      
      ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
      deadlock is fixed by handling bitmap io asynchronously.
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529131106.2123367-7-yukuai1@huaweicloud.com
      9efcc2c3
    • Yu Kuai's avatar
      md/md-bitmap: add a new helper to unplug bitmap asynchrously · a022325a
      Yu Kuai authored
      
      
      If bitmap is enabled, bitmap must update before submitting write io, this
      is why unplug callback must move these io to 'conf->pending_io_list' if
      'current->bio_list' is not empty, which will suffer performance
      degradation.
      
      A new helper md_bitmap_unplug_async() is introduced to submit bitmap io
      in a kworker, so that submit bitmap io in raid10_unplug() doesn't require
      that 'current->bio_list' is empty.
      
      This patch prepare to limit the number of plugged bio.
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529131106.2123367-6-yukuai1@huaweicloud.com
      a022325a
    • Yu Kuai's avatar
      md/raid1-10: submit write io directly if bitmap is not enabled · 7db922ba
      Yu Kuai authored
      Commit 6cce3b23 ("[PATCH] md: write intent bitmap support for raid10")
      add bitmap support, and it changed that write io is submitted through
      daemon thread because bitmap need to be updated before write io. And
      later, plug is used to fix performance regression because all the write io
      will go to demon thread, which means io can't be issued concurrently.
      
      However, if bitmap is not enabled, the write io should not go to daemon
      thread in the first place, and plug is not needed as well.
      
      Fixes: 6cce3b23
      
       ("[PATCH] md: write intent bitmap support for raid10")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529131106.2123367-5-yukuai1@huaweicloud.com
      7db922ba
    • Yu Kuai's avatar
      md/raid1-10: factor out a helper to submit normal write · 8295efbe
      Yu Kuai authored
      
      
      There are multiple places to do the same thing, factor out a helper to
      prevent redundant code, and the helper will be used in following patch
      as well.
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529131106.2123367-4-yukuai1@huaweicloud.com
      8295efbe
    • Yu Kuai's avatar
      md/raid1-10: factor out a helper to add bio to plug · 5ec6ca14
      Yu Kuai authored
      
      
      The code in raid1 and raid10 is identical, prepare to limit the number
      of plugged bios.
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529131106.2123367-3-yukuai1@huaweicloud.com
      5ec6ca14
    • Yu Kuai's avatar
      md/raid10: prevent soft lockup while flush writes · 01044462
      Yu Kuai authored
      
      
      Currently, there is no limit for raid1/raid10 plugged bio. While flushing
      writes, raid1 has cond_resched() while raid10 doesn't, and too many
      writes can cause soft lockup.
      
      Follow up soft lockup can be triggered easily with writeback test for
      raid10 with ramdisks:
      
      watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
      Call Trace:
       <TASK>
       call_rcu+0x16/0x20
       put_object+0x41/0x80
       __delete_object+0x50/0x90
       delete_object_full+0x2b/0x40
       kmemleak_free+0x46/0xa0
       slab_free_freelist_hook.constprop.0+0xed/0x1a0
       kmem_cache_free+0xfd/0x300
       mempool_free_slab+0x1f/0x30
       mempool_free+0x3a/0x100
       bio_free+0x59/0x80
       bio_put+0xcf/0x2c0
       free_r10bio+0xbf/0xf0
       raid_end_bio_io+0x78/0xb0
       one_write_done+0x8a/0xa0
       raid10_end_write_request+0x1b4/0x430
       bio_endio+0x175/0x320
       brd_submit_bio+0x3b9/0x9b7 [brd]
       __submit_bio+0x69/0xe0
       submit_bio_noacct_nocheck+0x1e6/0x5a0
       submit_bio_noacct+0x38c/0x7e0
       flush_pending_writes+0xf0/0x240
       raid10d+0xac/0x1ed0
      
      Fix the problem by adding cond_resched() to raid10 like what raid1 did.
      
      Note that unlimited plugged bio still need to be optimized, for example,
      in the case of lots of dirty pages writeback, this will take lots of
      memory and io will spend a long time in plug, hence io latency is bad.
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529131106.2123367-2-yukuai1@huaweicloud.com
      01044462