Skip to content
  1. Jan 17, 2024
    • Eric Dumazet's avatar
      nbd: always initialize struct msghdr completely · 78fbb92a
      Eric Dumazet authored
      syzbot complains that msg->msg_get_inq value can be uninitialized [1]
      
      struct msghdr got many new fields recently, we should always make
      sure their values is zero by default.
      
      [1]
       BUG: KMSAN: uninit-value in tcp_recvmsg+0x686/0xac0 net/ipv4/tcp.c:2571
        tcp_recvmsg+0x686/0xac0 net/ipv4/tcp.c:2571
        inet_recvmsg+0x131/0x580 net/ipv4/af_inet.c:879
        sock_recvmsg_nosec net/socket.c:1044 [inline]
        sock_recvmsg+0x12b/0x1e0 net/socket.c:1066
        __sock_xmit+0x236/0x5c0 drivers/block/nbd.c:538
        nbd_read_reply drivers/block/nbd.c:732 [inline]
        recv_work+0x262/0x3100 drivers/block/nbd.c:863
        process_one_work kernel/workqueue.c:2627 [inline]
        process_scheduled_works+0x104e/0x1e70 kernel/workqueue.c:2700
        worker_thread+0xf45/0x1490 kernel/workqueue.c:2781
        kthread+0x3ed/0x540 kernel/kthread.c:388
        ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
      
      Local variable msg created at:
        __sock_xmit+0x4c/0x5c0 drivers/block/nbd.c:513
        nbd_read_reply drivers/block/nbd.c:732 [inline]
        recv_work+0x262/0x3100 drivers/block/nbd.c:863
      
      CPU: 1 PID: 7465 Comm: kworker/u5:1 Not tainted 6.7.0-rc7-syzkaller-00041-gf016f7547aee #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      Workqueue: nbd5-recv recv_work
      
      Fixes: f94fd25c
      
       ("tcp: pass back data left in socket after receive")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: stable@vger.kernel.org
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: nbd@other.debian.org
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240112132657.647112-1-edumazet@google.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      78fbb92a
    • Matthew Wilcox (Oracle)'s avatar
      block: Fix iterating over an empty bio with bio_for_each_folio_all · 7bed6f3d
      Matthew Wilcox (Oracle) authored
      
      
      If the bio contains no data, bio_first_folio() calls page_folio() on a
      NULL pointer and oopses.  Move the test that we've reached the end of
      the bio from bio_next_folio() to bio_first_folio().
      
      Reported-by: default avatar <syzbot+8b23309d5788a79d3eea@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+004c1e0fced2b4bc3dcc@syzkaller.appspotmail.com>
      Fixes: 640d1930
      
       ("block: Add bio_for_each_folio_all()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Link: https://lore.kernel.org/r/20240116212959.3413014-1-willy@infradead.org
      [axboe: add unlikely() to error case]
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7bed6f3d
    • Dmitry Antipov's avatar
      block: bio-integrity: fix kcalloc() arguments order · be50df31
      Dmitry Antipov authored
      When compiling with gcc version 14.0.1 20240116 (experimental)
      and W=1, I've noticed the following warning:
      
      block/bio-integrity.c: In function 'bio_integrity_map_user':
      block/bio-integrity.c:339:38: warning: 'kcalloc' sizes specified with 'sizeof'
      in the earlier argument and not in the later argument [-Wcalloc-transposed-args]
        339 |                 bvec = kcalloc(sizeof(*bvec), nr_vecs, GFP_KERNEL);
            |                                      ^
      block/bio-integrity.c:339:38: note: earlier argument should specify number of
      elements, later size of each element
      
      Since 'n' and 'size' arguments of 'kcalloc()' are multiplied to
      calculate the final size, their actual order doesn't affect the
      result and so this is not a bug. But it's still worth to fix it.
      
      Fixes: 492c5d45
      
       ("block: bio-integrity: directly map user buffers")
      Signed-off-by: default avatarDmitry Antipov <dmantipov@yandex.ru>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Link: https://lore.kernel.org/r/20240116143437.89060-1-dmantipov@yandex.ru
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      be50df31
  2. Jan 16, 2024
  3. Jan 15, 2024
  4. Jan 14, 2024
  5. Jan 13, 2024
  6. Jan 12, 2024
    • Ming Lei's avatar
      blk-mq: fix IO hang from sbitmap wakeup race · 5266caaf
      Ming Lei authored
      
      
      In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered
      with the following blk_mq_get_driver_tag() in case of getting driver
      tag failure.
      
      Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe
      the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime
      blk_mq_mark_tag_wait() can't get driver tag successfully.
      
      This issue can be reproduced by running the following test in loop, and
      fio hang can be observed in < 30min when running it on my test VM
      in laptop.
      
      	modprobe -r scsi_debug
      	modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4
      	dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
      	fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \
             		--runtime=100 --numjobs=40 --time_based --name=test \
              	--ioengine=libaio
      
      Fix the issue by adding one explicit barrier in blk_mq_mark_tag_wait(), which
      is just fine in case of running out of tag.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kemeng Shi <shikemeng@huaweicloud.com>
      Reported-by: default avatarChanghui Zhong <czhong@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20240112122626.4181044-1-ming.lei@redhat.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5266caaf
  7. Jan 11, 2024
    • Jens Axboe's avatar
      Merge tag 'nvme-6.8-2024-1-10' of git://git.infradead.org/nvme into for-6.8/block · b2da1975
      Jens Axboe authored
      Pull NVMe changes from Keith:
      
      "nvme follow-up updates for Linux 6.8
      
       - tcp, fc, and rdma target fixes (Maurizio, Daniel, Hannes, Christoph)
       - discard fixes and improvements (Christoph)
       - timeout debug improvements (Keith, Max)
       - various cleanups (Daniel, Max, Giuxen)
       - trace event string fixes (Arnd)
       - shadow doorbell setup on reset fix (William)
       - a write zeroes quirk for SK Hynix (Jim)"
      
      * tag 'nvme-6.8-2024-1-10' of git://git.infradead.org/nvme: (25 commits)
        nvmet-rdma: avoid circular locking dependency on install_queue()
        nvmet-tcp: avoid circular locking dependency on install_queue()
        nvme-pci: set doorbell config before unquiescing
        nvmet-tcp: Fix the H2C expected PDU len calculation
        nvme-tcp: enhance timeout kernel log
        nvme-rdma: enhance timeout kernel log
        nvme-pci: enhance timeout kernel log
        nvme: trace: avoid memcpy overflow warning
        nvmet: re-fix tracing strncpy() warning
        nvme: introduce nvme_disk_is_ns_head helper
        nvme-pci: disable write zeroes for SK Hynix BC901
        nvmet-fcloop: Remove remote port from list when unlinking
        nvmet-trace: avoid dereferencing pointer too early
        nvmet-fc: remove unnecessary bracket
        nvme: simplify the max_discard_segments calculation
        nvme: fix max_discard_sectors calculation
        nvme: also skip discard granularity updates in nvme_config_discard
        nvme: update the explanation for not updating the limits in nvme_config_discard
        nvmet-tcp: fix a missing endianess conversion in nvmet_tcp_try_peek_pdu
        nvme-common: mark nvme_tls_psk_prio static
        ...
      b2da1975
    • Hannes Reinecke's avatar
      nvmet-rdma: avoid circular locking dependency on install_queue() · 31deaeb1
      Hannes Reinecke authored
      
      
      nvmet_rdma_install_queue() is driven from the ->io_work workqueue
      function, but will call flush_workqueue() which might trigger
      ->release_work() which in itself calls flush_work on ->io_work.
      
      To avoid that check for pending queue in disconnecting status,
      and return 'controller busy' when we reached a certain threshold.
      
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Tested-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      31deaeb1
    • Hannes Reinecke's avatar
      nvmet-tcp: avoid circular locking dependency on install_queue() · 07a29b13
      Hannes Reinecke authored
      
      
      nvmet_tcp_install_queue() is driven from the ->io_work workqueue
      function, but will call flush_workqueue() which might trigger
      ->release_work() which in itself calls flush_work on ->io_work.
      
      To avoid that check for pending queue in disconnecting status,
      and return 'controller busy' when we reached a certain threshold.
      
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Tested-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      07a29b13
    • William Butler's avatar
      nvme-pci: set doorbell config before unquiescing · 06c59d42
      William Butler authored
      
      
      During resets, if queues are unquiesced first, then the host can submit
      IOs to the controller using shadow doorbell logic but the controller
      won't be aware. This can lead to necessary MMIO doorbells from being
      not issued, causing requests to be delayed and timed-out.
      
      Signed-off-by: default avatarWilliam Butler <wab@google.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      06c59d42
    • Damien Le Moal's avatar
      block: fix partial zone append completion handling in req_bio_endio() · 748dc0b6
      Damien Le Moal authored
      Partial completions of zone append request is not allowed but if a zone
      append completion indicates a number of completed bytes different from
      the original BIO size, only the BIO status is set to error. This leads
      to bio_advance() not setting the BIO size to 0 and thus to not call
      bio_endio() at the end of req_bio_endio().
      
      Make sure a partially completed zone append is failed and completed
      immediately by forcing the completed number of bytes (nbytes) to be
      equal to the BIO size, thus ensuring that bio_endio() is called.
      
      Fixes: 297db731
      
       ("block: fix req_bio_endio append error handling")
      Cc: stable@kernel.vger.org
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Link: https://lore.kernel.org/r/20240110092942.442334-1-dlemoal@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      748dc0b6
  8. Jan 10, 2024
    • Jens Axboe's avatar
      block/iocost: silence warning on 'last_period' potentially being unused · 742e324a
      Jens Axboe authored
      
      
      If CONFIG_TRACEPOINTS isn't enabled, we assign this variable but then
      never use it. This can cause the compiler to complain about that:
      
      block/blk-iocost.c:1264:6: warning: variable 'last_period' set but not used [-Wunused-but-set-variable]
       1264 |         u64 last_period, cur_period;
            |             ^
      
      Rather than add ifdefs to guard this, just mark it __maybe_unused.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202401102335.GiWdeIo9-lkp@intel.com/
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      742e324a
    • Jens Axboe's avatar
      Merge tag 'md-6.8-20240109' of... · c8300953
      Jens Axboe authored
      Merge tag 'md-6.8-20240109' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.8/block
      
      Pull MD fixes from Song:
      
      "1. Sparse warning since v6.0, by Bart;
       2. /proc/mdstat regression since v6.7, by Yu Kuai."
      
      * tag 'md-6.8-20240109' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md/raid1: Use blk_opf_t for read and write operations
        md: Fix md_seq_ops() regressions
      c8300953
    • Bart Van Assche's avatar
      md/raid1: Use blk_opf_t for read and write operations · 7dab2455
      Bart Van Assche authored
      Use the type blk_opf_t for read and write operations instead of int. This
      patch does not affect the generated code but fixes the following sparse
      warning:
      
      drivers/md/raid1.c:1993:60: sparse: sparse: incorrect type in argument 5 (different base types)
           expected restricted blk_opf_t [usertype] opf
           got int rw
      
      Cc: Song Liu <song@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Fixes: 3c5e514d
      
       ("md/raid1: Use the new blk_opf_t type")
      Cc: stable@vger.kernel.org # v6.0+
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202401080657.UjFnvQgX-lkp@intel.com/
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20240108001223.23835-1-bvanassche@acm.org
      7dab2455
    • Yu Kuai's avatar
      md: Fix md_seq_ops() regressions · f9cfe7e7
      Yu Kuai authored
      Commit cf1b6d44 ("md: simplify md_seq_ops") introduce following
      regressions:
      
      1) If list all_mddevs is emptly, personalities and unused devices won't
         be showed to user anymore.
      2) If seq_file buffer overflowed from md_seq_show(), then md_seq_start()
         will be called again, hence personalities will be showed to user
         again.
      3) If seq_file buffer overflowed from md_seq_stop(), seq_read_iter()
         doesn't handle this, hence unused devices won't be showed to user.
      
      Fix above problems by printing personalities and unused devices in
      md_seq_show().
      
      Fixes: cf1b6d44
      
       ("md: simplify md_seq_ops")
      Cc: stable@vger.kernel.org # v6.7+
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20240109133957.2975272-1-yukuai1@huaweicloud.com
      f9cfe7e7
  9. Jan 09, 2024
  10. Jan 08, 2024
  11. Jan 06, 2024
    • Arnd Bergmann's avatar
      nvme: trace: avoid memcpy overflow warning · a7de1dea
      Arnd Bergmann authored
      A previous patch introduced a struct_group() in nvme_common_command to help
      stringop fortification figure out the length of the fields, but one function
      is not currently using them:
      
      In file included from drivers/nvme/target/core.c:7:
      In file included from include/linux/string.h:254:
      include/linux/fortify-string.h:592:4: error: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
                              __read_overflow2_field(q_size_field, size);
                              ^
      
      Change this one to use the correct field name to avoid the warning.
      
      Fixes: 5c629dc9
      
       ("nvme: use struct group for generic command dwords")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      a7de1dea
    • Arnd Bergmann's avatar
      nvmet: re-fix tracing strncpy() warning · 4ee7ffeb
      Arnd Bergmann authored
      An earlier patch had tried to address a warning about a string copy with
      missing zero termination:
      
      drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
      
      The new version causes a different warning with some compiler versions, notably
      gcc-9 and gcc-10, and also misses the zero padding that was apparently done
      intentionally in the original code:
      
      drivers/nvme/target/trace.h:56:2: error: 'strncpy' specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
      
      Change it to use strscpy_pad() with the original length, which will give
      a properly padded and zero-terminated string as well as avoiding the warning.
      
      Fixes: d86481e9
      
       ("nvmet: use min of device_path and disk len")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      4ee7ffeb
    • Guixin Liu's avatar
      nvme: introduce nvme_disk_is_ns_head helper · bafd5909
      Guixin Liu authored
      
      
      We currently rely on gendisk's file operations (fops) to distinguish
      between a namespace head (ns_head) and a regular namespace. To enhance
      code readability, introduce a helper function.
      Additionally, we must ensure that the device is not an ns_head before
      calling nvme_get_ns_from_dev(). To enforce this, add a WARN_ON check
      within the nvme_get_ns_from_dev().
      
      Signed-off-by: default avatarGuixin Liu <kanie@linux.alibaba.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarLiu Song <liusong@linux.alibaba.com>
      [include fix: https://lore.kernel.org/oe-kbuild-all/202401031943.0N72Tkji-lkp@intel.com/]
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      bafd5909
    • Jim.Lin's avatar
      nvme-pci: disable write zeroes for SK Hynix BC901 · bd029a02
      Jim.Lin authored
      
      
      SK Hynix BC901 drive write zero will cause Chromebook takes more than 20 mins to switch to developer mode
      "disable write zeroes" can fix this issue and Sk Hynix has been verified.
      
      Signed-off-by: default avatarJim.Lin <jim.lin@siliconmotion.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      bd029a02
    • Daniel Wagner's avatar
      nvmet-fcloop: Remove remote port from list when unlinking · f644d21b
      Daniel Wagner authored
      
      
      The remote port is removed too late from fcloop_nports list. Remove it
      when port is unregistered.
      
      This prevents a busy loop in fcloop_exit, because it is possible the
      remote port is found in the list and thus we will never progress.
      
      The kernel log will be spammed with
      
        nvme_fcloop: fcloop_exit: Failed deleting remote port
        nvme_fcloop: fcloop_exit: Failed deleting target port
      
      Signed-off-by: default avatarDaniel Wagner <dwagner@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      f644d21b
  12. Jan 05, 2024
  13. Jan 04, 2024