Skip to content
  1. Jan 26, 2021
    • Jan Höppner's avatar
      s390/dasd: Fix inconsistent kobject removal · ac55ad2b
      Jan Höppner authored
      Our intention was to only remove path kobjects whenever a device is
      being set offline. However, one corner case was missing.
      
      If a device is disabled and enabled (using the IOCTLs BIODASDDISABLE and
      BIODASDENABLE respectively), the enabling process will call
      dasd_eckd_reload_device() which itself calls dasd_eckd_read_conf() in
      order to update path information. During that update,
      dasd_eckd_clear_conf_data() clears all old data and also removes all
      kobjects. This will leave us with an inconsistent state of path kobjects
      and a subsequent path verification leads to a failing kobject creation.
      
      Fix this by removing kobjects only in the context of offlining a device
      as initially intended.
      
      Fixes: 19508b20
      
       ("s390/dasd: Display FC Endpoint Security information via sysfs")
      Reported-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: default avatarJan Höppner <hoeppner@linux.ibm.com>
      Reviewed-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ac55ad2b
  2. Jan 25, 2021
  3. Jan 21, 2021
    • Pan Bian's avatar
      lightnvm: fix memory leak when submit fails · 97784481
      Pan Bian authored
      The allocated page is not released if error occurs in
      nvm_submit_io_sync_raw(). __free_page() is moved ealier to avoid
      possible memory leak issue.
      
      Fixes: aff3fb18
      
       ("lightnvm: move bad block and chunk state logic to core")
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      97784481
    • Jens Axboe's avatar
      Merge tag 'nvme-5.11-2020-01-21' of git://git.infradead.org/nvme into block-5.11 · 1df35bf0
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme fixes for 5.11:
      
       - fix a status code in nvmet (Chaitanya Kulkarni)
       - avoid double completions in nvme-rdma/nvme-tcp (Chao Leng)
       - fix the CMB support to cope with NVMe 1.4 controllers (Klaus Jensen)
       - fix PRINFO handling in the passthrough ioctl (Revanth Rajashekar)
       - fix a double DMA unmap in nvme-pci"
      
      * tag 'nvme-5.11-2020-01-21' of git://git.infradead.org/nvme:
        nvme-pci: fix error unwind in nvme_map_data
        nvme-pci: refactor nvme_unmap_data
        nvmet: set right status on error in id-ns handler
        nvme-pci: allow use of cmb on v1.4 controllers
        nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout
        nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout
        nvme: check the PRINFO bit before deciding the host buffer length
      1df35bf0
    • Christoph Hellwig's avatar
      nvme-pci: fix error unwind in nvme_map_data · fa073216
      Christoph Hellwig authored
      Properly unwind step by step using refactored helpers from nvme_unmap_data
      to avoid a potential double dma_unmap on a mapping failure.
      
      Fixes: 7fe07d14
      
       ("nvme-pci: merge nvme_free_iod into nvme_unmap_data")
      Reported-by: default avatarMarc Orr <marcorr@google.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarMarc Orr <marcorr@google.com>
      fa073216
    • Christoph Hellwig's avatar
      nvme-pci: refactor nvme_unmap_data · 9275c206
      Christoph Hellwig authored
      
      
      Split out three helpers from nvme_unmap_data that will allow finer grained
      unwinding from nvme_map_data.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarMarc Orr <marcorr@google.com>
      9275c206
    • Jens Axboe's avatar
      Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.11 · 8dfe1168
      Jens Axboe authored
      Pull MD fix from Song.
      
      * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md: Set prev_flush_start and flush_bio in an atomic way
      8dfe1168
    • Xiao Ni's avatar
      md: Set prev_flush_start and flush_bio in an atomic way · dc5d17a3
      Xiao Ni authored
      
      
      One customer reports a crash problem which causes by flush request. It
      triggers a warning before crash.
      
              /* new request after previous flush is completed */
              if (ktime_after(req_start, mddev->prev_flush_start)) {
                      WARN_ON(mddev->flush_bio);
                      mddev->flush_bio = bio;
                      bio = NULL;
              }
      
      The WARN_ON is triggered. We use spin lock to protect prev_flush_start and
      flush_bio in md_flush_request. But there is no lock protection in
      md_submit_flush_data. It can set flush_bio to NULL first because of
      compiler reordering write instructions.
      
      For example, flush bio1 sets flush bio to NULL first in
      md_submit_flush_data. An interrupt or vmware causing an extended stall
      happen between updating flush_bio and prev_flush_start. Because flush_bio
      is NULL, flush bio2 can get the lock and submit to underlayer disks. Then
      flush bio1 updates prev_flush_start after the interrupt or extended stall.
      
      Then flush bio3 enters in md_flush_request. The start time req_start is
      behind prev_flush_start. The flush_bio is not NULL(flush bio2 hasn't
      finished). So it can trigger the WARN_ON now. Then it calls INIT_WORK
      again. INIT_WORK() will re-initialize the list pointers in the
      work_struct, which then can result in a corrupted work list and the
      work_struct queued a second time. With the work list corrupted, it can
      lead in invalid work items being used and cause a crash in
      process_one_work.
      
      We need to make sure only one flush bio can be handled at one same time.
      So add spin lock in md_submit_flush_data to protect prev_flush_start and
      flush_bio in an atomic way.
      
      Reviewed-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      dc5d17a3
  4. Jan 19, 2021
    • Chaitanya Kulkarni's avatar
      nvmet: set right status on error in id-ns handler · bffcd507
      Chaitanya Kulkarni authored
      
      
      The function nvmet_execute_identify_ns() doesn't set the status if call
      to nvmet_find_namespace() fails. In that case we set the status of the
      request to the value return by the nvmet_copy_sgl().
      
      Set the status to NVME_SC_INVALID_NS and adjust the code such that
      request will have the right status on nvmet_find_namespace() failure.
      
      Without this patch :-
      NVME Identify Namespace 3:
      nsze    : 0
      ncap    : 0
      nuse    : 0
      nsfeat  : 0
      nlbaf   : 0
      flbas   : 0
      mc      : 0
      dpc     : 0
      dps     : 0
      nmic    : 0
      rescap  : 0
      fpi     : 0
      dlfeat  : 0
      nawun   : 0
      nawupf  : 0
      nacwu   : 0
      nabsn   : 0
      nabo    : 0
      nabspf  : 0
      noiob   : 0
      nvmcap  : 0
      mssrl   : 0
      mcl     : 0
      msrc    : 0
      nsattr	: 0
      nvmsetid: 0
      anagrpid: 0
      endgid  : 0
      nguid   : 00000000000000000000000000000000
      eui64   : 0000000000000000
      lbaf  0 : ms:0   lbads:0  rp:0 (in use)
      
      With this patch-series :-
      feb3b88b501e (HEAD -> nvme-5.11) nvmet: remove extra variable in identify ns
      6302aa67210a nvmet: remove extra variable in id-desclist
      ed57951da453 nvmet: remove extra variable in smart log nsid
      be384b8c24dc nvmet: set right status on error in id-ns handler
      
      NVMe status: INVALID_NS: The namespace or the format of that namespace is invalid(0xb)
      
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      bffcd507
    • Klaus Jensen's avatar
      nvme-pci: allow use of cmb on v1.4 controllers · 20d3bb92
      Klaus Jensen authored
      
      
      Since NVMe v1.4 the Controller Memory Buffer must be explicitly enabled
      by the host.
      
      Signed-off-by: default avatarKlaus Jensen <k.jensen@samsung.com>
      [hch: avoid a local variable and add a comment]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      20d3bb92
    • Chao Leng's avatar
      nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout · 9ebbfe49
      Chao Leng authored
      
      
      Each name space has a request queue, if complete request long time,
      multi request queues may have time out requests at the same time,
      nvme_tcp_timeout will execute concurrently. Multi requests in different
      request queues may be queued in the same tcp queue, multi
      nvme_tcp_timeout may call nvme_tcp_stop_queue at the same time.
      The first nvme_tcp_stop_queue will clear NVME_TCP_Q_LIVE and continue
      stopping the tcp queue(cancel io_work), but the others check
      NVME_TCP_Q_LIVE is already cleared, and then directly complete the
      requests, complete request before the io work is completely canceled may
      lead to a use-after-free condition.
      Add a multex lock to serialize nvme_tcp_stop_queue.
      
      Signed-off-by: default avatarChao Leng <lengchao@huawei.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      9ebbfe49
    • Chao Leng's avatar
      nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout · 7674073b
      Chao Leng authored
      
      
      A crash happens when inject completing request long time(nearly 30s).
      Each name space has a request queue, when inject completing request long
      time, multi request queues may have time out requests at the same time,
      nvme_rdma_timeout will execute concurrently. Multi requests in different
      request queues may be queued in the same rdma queue, multi
      nvme_rdma_timeout may call nvme_rdma_stop_queue at the same time.
      The first nvme_rdma_timeout will clear NVME_RDMA_Q_LIVE and continue
      stopping the rdma queue(drain qp), but the others check NVME_RDMA_Q_LIVE
      is already cleared, and then directly complete the requests, complete
      request before the qp is fully drained may lead to a use-after-free
      condition.
      
      Add a multex lock to serialize nvme_rdma_stop_queue.
      
      Signed-off-by: default avatarChao Leng <lengchao@huawei.com>
      Tested-by: default avatarIsrael Rukshin <israelr@nvidia.com>
      Reviewed-by: default avatarIsrael Rukshin <israelr@nvidia.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      7674073b
    • Revanth Rajashekar's avatar
      nvme: check the PRINFO bit before deciding the host buffer length · 4d6b1c95
      Revanth Rajashekar authored
      
      
      According to NVMe spec v1.4, section 8.3.1, the PRINFO bit and
      the metadata size play a vital role in deteriming the host buffer size.
      
      If PRIFNO bit is set and MS==8, the host doesn't add the metadata buffer,
      instead the controller adds it.
      
      Signed-off-by: default avatarRevanth Rajashekar <revanth.rajashekar@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      4d6b1c95
  5. Jan 15, 2021
  6. Jan 10, 2021
    • Coly Li's avatar
      bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET · 5342fd42
      Coly Li authored
      If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
      set, it means the cache device is created with obsoleted layout with
      obso_bucket_site_hi. Now bcache does not support this feature bit, a new
      BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
      for a better layout to support large bucket size.
      
      For the legacy compatibility purpose, if a cache device created with
      obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
      devices attached to this cache set should be set to read-only. Then the
      dirty data can be written back to backing device before re-create the
      cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
      by the latest bcache-tools.
      
      This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
      when running a cache set and attach a bcache device to the cache set. If
      this bit is set,
      - When run a cache set, print an error kernel message to indicate all
        following attached bcache device will be read-only.
      - When attach a bcache device, print an error kernel message to indicate
        the attached bcache device will be read-only, and ask users to update
        to latest bcache-tools.
      
      Such change is only for cache device whose bucket size >= 32MB, this is
      for the zoned SSD and almost nobody uses such large bucket size at this
      moment. If you don't explicit set a large bucket size for a zoned SSD,
      such change is totally transparent to your bcache device.
      
      Fixes: ffa47032
      
       ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5342fd42
    • Coly Li's avatar
      bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket · b16671e8
      Coly Li authored
      When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
      was introduced into the incompat feature set. It used bucket_size_hi
      (which was added at the tail of struct cache_sb_disk) to extend current
      16bit bucket size to 32bit with existing bucket_size in struct
      cache_sb_disk.
      
      This is not a good idea, there are two obvious problems,
      - Bucket size is always value power of 2, if store log2(bucket size) in
        existing bucket_size of struct cache_sb_disk, it is unnecessary to add
        bucket_size_hi.
      - Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
        struct cache_sb_disk, bucket_size_hi was added after d[] which makes
        csum_set calculate an unexpected super block checksum.
      
      To fix the above problems, this patch introduces a new incompat feature
      bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
      means bucket_size in struct cache_sb_disk stores the order of power-of-2
      bucket size value. When user specifies a bucket size larger than 32768
      sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
      incompat feature set, and bucket_size stores log2(bucket size) more
      than store the real bucket size value.
      
      The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
      it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
      recognized by kernel driver for legacy compatible purpose. The previous
      bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
      and not used in bcache-tools anymore.
      
      For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
      bcache-tools and kernel driver still recognize the feature string and
      display it as "obso_large_bucket".
      
      With this change, the unnecessary extra space extend of bcache on-disk
      super block can be avoided, and csum_set() may generate expected check
      sum as well.
      
      Fixes: ffa47032
      
       ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b16671e8
    • Coly Li's avatar
      bcache: check unsupported feature sets for bcache register · 1dfc0686
      Coly Li authored
      This patch adds the check for features which is incompatible for
      current supported feature sets.
      
      Now if the bcache device created by bcache-tools has features that
      current kernel doesn't support, read_super() will fail with error
      messoage. E.g. if an unsupported incompatible feature detected,
      bcache register will fail with dmesg "bcache: register_bcache() error :
      Unsupported incompatible feature found".
      
      Fixes: d721a43f ("bcache: increase super block version for cache device and backing device")
      Fixes: ffa47032
      
       ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1dfc0686
    • Coly Li's avatar
      bcache: fix typo from SUUP to SUPP in features.h · f7b4943d
      Coly Li authored
      This patch fixes the following typos,
      from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
      from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
      from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP
      
      Fixes: d721a43f ("bcache: increase super block version for cache device and backing device")
      Fixes: ffa47032
      
       ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f7b4943d
    • Yi Li's avatar
      bcache: set pdev_set_uuid before scond loop iteration · e8092707
      Yi Li authored
      
      
      There is no need to reassign pdev_set_uuid in the second loop iteration,
      so move it to the place before second loop.
      
      Signed-off-by: default avatarYi Li <yili@winhong.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e8092707
  7. Jan 08, 2021
    • John Garry's avatar
      blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED · 02f938e9
      John Garry authored
      Showing the hctx flags for when BLK_MQ_F_TAG_HCTX_SHARED is set gives
      something like:
      
      root@debian:/home/john# more /sys/kernel/debug/block/sda/hctx0/flags
      alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED|3
      
      Add the decoding for that flag.
      
      Fixes: 32bc15af
      
       ("blk-mq: Facilitate a shared sbitmap per tagset")
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      02f938e9
    • Jack Wang's avatar
      block/rnbd-clt: avoid module unload race with close confirmation · 3a21777c
      Jack Wang authored
      We had kernel panic, it is caused by unload module and last
      close confirmation.
      
      call trace:
      [1196029.743127]  free_sess+0x15/0x50 [rtrs_client]
      [1196029.743128]  rtrs_clt_close+0x4c/0x70 [rtrs_client]
      [1196029.743129]  ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
      [1196029.743130]  close_rtrs+0x25/0x50 [rnbd_client]
      [1196029.743131]  rnbd_client_exit+0x93/0xb99 [rnbd_client]
      [1196029.743132]  __x64_sys_delete_module+0x190/0x260
      
      And in the crashdump confirmation kworker is also running.
      PID: 6943   TASK: ffff9e2ac8098000  CPU: 4   COMMAND: "kworker/4:2"
       #0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
       #1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
       #2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
       #3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
       #4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
       #5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
       #6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
       #7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
       #8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
       #9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]
      
      The problem is both code path try to close same session, which lead to
      panic.
      
      To fix it, just skip the sess if the refcount already drop to 0.
      
      Fixes: f7a7a5c2
      
       ("block/rnbd: client: main functionality")
      Signed-off-by: default avatarJack Wang <jinpu.wang@cloud.ionos.com>
      Reviewed-by: default avatarGioh Kim <gi-oh.kim@cloud.ionos.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3a21777c
    • Swapnil Ingle's avatar
      block/rnbd: Adding name to the Contributors List · ef8048dd
      Swapnil Ingle authored
      
      
      Adding name to the Contributors List
      
      Signed-off-by: default avatarSwapnil Ingle <ingleswapnil@gmail.com>
      Acked-by: default avatarJack Wang <jinpu.wang@cloud.ionos.com>
      Acked-by: default avatarDanil Kipnis <danil.kipnis@cloud.ionos.com>
      Signed-off-by: default avatarJack Wang <jinpu.wang@cloud.ionos.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ef8048dd
    • Guoqing Jiang's avatar
      block/rnbd-clt: Fix sg table use after free · 80f99093
      Guoqing Jiang authored
      Since dynamically allocate sglist is used for rnbd_iu, we can't free sg
      table after send_usr_msg since the callback function (cqe.done) could
      still access the sglist.
      
      Otherwise KASAN reports UAF issue:
      
      [ 4856.600257] BUG: KASAN: use-after-free in dma_direct_unmap_sg+0x53/0x290
      [ 4856.600772] Read of size 4 at addr ffff888206af3a98 by task swapper/1/0
      
      [ 4856.601729] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G        W         5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
      [ 4856.601748] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
      [ 4856.601766] Call Trace:
      [ 4856.601785]  <IRQ>
      [ 4856.601822]  dump_stack+0x99/0xcb
      [ 4856.601856]  ? dma_direct_unmap_sg+0x53/0x290
      [ 4856.601888]  print_address_description.constprop.7+0x1e/0x230
      [ 4856.601913]  ? freeze_kernel_threads+0x73/0x73
      [ 4856.601965]  ? mark_held_locks+0x29/0xa0
      [ 4856.602019]  ? dma_direct_unmap_sg+0x53/0x290
      [ 4856.602039]  ? dma_direct_unmap_sg+0x53/0x290
      [ 4856.602079]  kasan_report.cold.9+0x37/0x7c
      [ 4856.602188]  ? mlx5_ib_post_recv+0x430/0x520 [mlx5_ib]
      [ 4856.602209]  ? dma_direct_unmap_sg+0x53/0x290
      [ 4856.602256]  dma_direct_unmap_sg+0x53/0x290
      [ 4856.602366]  complete_rdma_req+0x188/0x4b0 [rtrs_client]
      [ 4856.602451]  ? rtrs_clt_close+0x80/0x80 [rtrs_client]
      [ 4856.602535]  ? mlx5_ib_poll_cq+0x48b/0x16e0 [mlx5_ib]
      [ 4856.602589]  ? radix_tree_insert+0x3a0/0x3a0
      [ 4856.602610]  ? do_raw_spin_lock+0x119/0x1d0
      [ 4856.602647]  ? rwlock_bug.part.1+0x60/0x60
      [ 4856.602740]  rtrs_clt_rdma_done+0x3f7/0x670 [rtrs_client]
      [ 4856.602804]  ? rtrs_clt_rdma_cm_handler+0xda0/0xda0 [rtrs_client]
      [ 4856.602857]  ? check_flags.part.31+0x6c/0x1f0
      [ 4856.602927]  ? rcu_read_lock_sched_held+0xaf/0xe0
      [ 4856.602963]  ? rcu_read_lock_bh_held+0xc0/0xc0
      [ 4856.603137]  __ib_process_cq+0x10a/0x350 [ib_core]
      [ 4856.603309]  ib_poll_handler+0x41/0x1c0 [ib_core]
      [ 4856.603358]  irq_poll_softirq+0xe6/0x280
      [ 4856.603392]  ? lockdep_hardirqs_on_prepare+0x111/0x210
      [ 4856.603446]  __do_softirq+0x10d/0x646
      [ 4856.603540]  asm_call_irq_on_stack+0x12/0x20
      [ 4856.603563]  </IRQ>
      
      [ 4856.605096] Allocated by task 8914:
      [ 4856.605510]  kasan_save_stack+0x19/0x40
      [ 4856.605532]  __kasan_kmalloc.constprop.7+0xc1/0xd0
      [ 4856.605552]  __kmalloc+0x155/0x320
      [ 4856.605574]  __sg_alloc_table+0x155/0x1c0
      [ 4856.605594]  sg_alloc_table+0x1f/0x50
      [ 4856.605620]  send_msg_sess_info+0x119/0x2e0 [rnbd_client]
      [ 4856.605646]  remap_devs+0x71/0x210 [rnbd_client]
      [ 4856.605676]  init_sess+0xad8/0xe10 [rtrs_client]
      [ 4856.605706]  rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
      [ 4856.605728]  process_one_work+0x521/0xa90
      [ 4856.605748]  worker_thread+0x65/0x5b0
      [ 4856.605769]  kthread+0x1f2/0x210
      [ 4856.605789]  ret_from_fork+0x22/0x30
      
      [ 4856.606159] Freed by task 8914:
      [ 4856.606559]  kasan_save_stack+0x19/0x40
      [ 4856.606580]  kasan_set_track+0x1c/0x30
      [ 4856.606601]  kasan_set_free_info+0x1b/0x30
      [ 4856.606622]  __kasan_slab_free+0x108/0x150
      [ 4856.606642]  slab_free_freelist_hook+0x64/0x190
      [ 4856.606661]  kfree+0xe2/0x650
      [ 4856.606681]  __sg_free_table+0xa4/0x100
      [ 4856.606707]  send_msg_sess_info+0x1d6/0x2e0 [rnbd_client]
      [ 4856.606733]  remap_devs+0x71/0x210 [rnbd_client]
      [ 4856.606763]  init_sess+0xad8/0xe10 [rtrs_client]
      [ 4856.606792]  rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
      [ 4856.606813]  process_one_work+0x521/0xa90
      [ 4856.606833]  worker_thread+0x65/0x5b0
      [ 4856.606853]  kthread+0x1f2/0x210
      [ 4856.606872]  ret_from_fork+0x22/0x30
      
      The solution is to free iu's sgtable after the iu is not used anymore.
      And also move sg_alloc_table into rnbd_get_iu accordingly.
      
      Fixes: 5a1328d0
      
       ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarJack Wang <jinpu.wang@cloud.ionos.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      80f99093
    • Jack Wang's avatar
      block/rnbd-srv: Fix use after free in rnbd_srv_sess_dev_force_close · 1a84e7c6
      Jack Wang authored
      KASAN detect following BUG:
      [  778.215311] ==================================================================
      [  778.216696] BUG: KASAN: use-after-free in rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.219037] Read of size 8 at addr ffff88b1d6516c28 by task tee/8842
      
      [  778.220500] CPU: 37 PID: 8842 Comm: tee Kdump: loaded Not tainted 5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
      [  778.220529] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
      [  778.220555] Call Trace:
      [  778.220609]  dump_stack+0x99/0xcb
      [  778.220667]  ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.220715]  print_address_description.constprop.7+0x1e/0x230
      [  778.220750]  ? freeze_kernel_threads+0x73/0x73
      [  778.220896]  ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.220932]  ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.220994]  kasan_report.cold.9+0x37/0x7c
      [  778.221066]  ? kobject_put+0x80/0x270
      [  778.221102]  ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.221184]  rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
      [  778.221240]  rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
      [  778.221304]  ? sysfs_file_ops+0x90/0x90
      [  778.221353]  kernfs_fop_write+0x141/0x240
      [  778.221451]  vfs_write+0x142/0x4d0
      [  778.221553]  ksys_write+0xc0/0x160
      [  778.221602]  ? __ia32_sys_read+0x50/0x50
      [  778.221684]  ? lockdep_hardirqs_on_prepare+0x13d/0x210
      [  778.221718]  ? syscall_enter_from_user_mode+0x1c/0x50
      [  778.221821]  do_syscall_64+0x33/0x40
      [  778.221862]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  778.221896] RIP: 0033:0x7f4affdd9504
      [  778.221928] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
      [  778.221956] RSP: 002b:00007fffebb36b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  778.222011] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4affdd9504
      [  778.222038] RDX: 0000000000000002 RSI: 00007fffebb36c50 RDI: 0000000000000003
      [  778.222066] RBP: 00007fffebb36c50 R08: 0000556a151aa600 R09: 00007f4affeb1540
      [  778.222094] R10: fffffffffffffc19 R11: 0000000000000246 R12: 0000556a151aa520
      [  778.222121] R13: 0000000000000002 R14: 00007f4affea6760 R15: 0000000000000002
      
      [  778.222764] Allocated by task 3212:
      [  778.223285]  kasan_save_stack+0x19/0x40
      [  778.223316]  __kasan_kmalloc.constprop.7+0xc1/0xd0
      [  778.223347]  kmem_cache_alloc_trace+0x186/0x350
      [  778.223382]  rnbd_srv_rdma_ev+0xf16/0x1690 [rnbd_server]
      [  778.223422]  process_io_req+0x4d1/0x670 [rtrs_server]
      [  778.223573]  __ib_process_cq+0x10a/0x350 [ib_core]
      [  778.223709]  ib_cq_poll_work+0x31/0xb0 [ib_core]
      [  778.223743]  process_one_work+0x521/0xa90
      [  778.223773]  worker_thread+0x65/0x5b0
      [  778.223802]  kthread+0x1f2/0x210
      [  778.223833]  ret_from_fork+0x22/0x30
      
      [  778.224296] Freed by task 8842:
      [  778.224800]  kasan_save_stack+0x19/0x40
      [  778.224829]  kasan_set_track+0x1c/0x30
      [  778.224860]  kasan_set_free_info+0x1b/0x30
      [  778.224889]  __kasan_slab_free+0x108/0x150
      [  778.224919]  slab_free_freelist_hook+0x64/0x190
      [  778.224947]  kfree+0xe2/0x650
      [  778.224982]  rnbd_destroy_sess_dev+0x2fa/0x3b0 [rnbd_server]
      [  778.225011]  kobject_put+0xda/0x270
      [  778.225046]  rnbd_srv_sess_dev_force_close+0x30/0x60 [rnbd_server]
      [  778.225081]  rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
      [  778.225111]  kernfs_fop_write+0x141/0x240
      [  778.225140]  vfs_write+0x142/0x4d0
      [  778.225169]  ksys_write+0xc0/0x160
      [  778.225198]  do_syscall_64+0x33/0x40
      [  778.225227]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  778.226506] The buggy address belongs to the object at ffff88b1d6516c00
                      which belongs to the cache kmalloc-512 of size 512
      [  778.227464] The buggy address is located 40 bytes inside of
                      512-byte region [ffff88b1d6516c00, ffff88b1d6516e00)
      
      The problem is in the sess_dev release function we call
      rnbd_destroy_sess_dev, and could free the sess_dev already, but we still
      set the keep_id in rnbd_srv_sess_dev_force_close, which lead to use
      after free.
      
      To fix it, move the keep_id before the sysfs removal, and cache the
      rnbd_srv_session for lock accessing,
      
      Fixes: 78699805
      
       ("block/rnbd-srv: close a mapped device from server side.")
      Signed-off-by: default avatarJack Wang <jinpu.wang@cloud.ionos.com>
      Reviewed-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1a84e7c6
    • Jack Wang's avatar
      block/rnbd: Select SG_POOL for RNBD_CLIENT · 74acfa99
      Jack Wang authored
      lkp reboot following build error:
       drivers/block/rnbd/rnbd-clt.c: In function 'rnbd_softirq_done_fn':
      >> drivers/block/rnbd/rnbd-clt.c:387:2: error: implicit declaration of function 'sg_free_table_chained' [-Werror=implicit-function-declaration]
           387 |  sg_free_table_chained(&iu->sgt, RNBD_INLINE_SG_CNT);
               |  ^~~~~~~~~~~~~~~~~~~~~
      
      The reason is CONFIG_SG_POOL is not enabled in the config, to
      avoid such failure, select SG_POOL in Kconfig for RNBD_CLIENT.
      
      Fixes: 5a1328d0
      
       ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarJack Wang <jinpu.wang@cloud.ionos.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74acfa99
    • Christoph Hellwig's avatar
      block: pre-initialize struct block_device in bdev_alloc_inode · 2d2f6f1b
      Christoph Hellwig authored
      bdev_evict_inode and bdev_free_inode are also called for the root inode
      of bdevfs, for which bdev_alloc is never called.  Move the zeroing o
      f struct block_device and the initialization of the bd_bdi field into
      bdev_alloc_inode to make sure they are initialized for the root inode
      as well.
      
      Fixes: e6cb5382
      
       ("block: initialize struct block_device in bdev_alloc")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Tested-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2d2f6f1b
    • Jens Axboe's avatar
      Merge tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme into block-5.11 · 04b1ecb6
      Jens Axboe authored
      Pull NVMe updates from Christoph:
      
      "nvme updates for 5.11:
      
       - fix a race in the nvme-tcp send code (Sagi Grimberg)
       - fix a list corruption in an nvme-rdma error path (Israel Rukshin)
       - avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
       - add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
       - fix two compiler warnings in nvme-fcloop (James Smart)
       - don't call sleeping functions from irq context in nvme-fc (James Smart)
       - remove an unused argument (Max Gurtovoy)
       - remove unused exports (Minwoo Im)"
      
      * tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme:
        nvme: remove the unused status argument from nvme_trace_bio_complete
        nvmet-rdma: Fix list_del corruption on queue establishment failure
        nvme: unexport functions with no external caller
        nvme: avoid possible double fetch in handling CQE
        nvme-tcp: Fix possible race of io_work and direct send
        nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
        nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
        nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context
      04b1ecb6
    • Satya Tangirala's avatar
      fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb · 04a6a536
      Satya Tangirala authored
      freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
      whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
      bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
      bd_fsfreeze_sb.
      
      But this means a freeze_bdev() call followed by a thaw_bdev() call can
      leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
      zero. If freeze_bdev() is called again, and this time
      get_active_super() returns NULL (e.g. because the FS is unmounted),
      we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
      *untouched* - it stays the same (now garbage) value. A subsequent
      thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
      (since bd_fsfreeze_count > 0), and attempt to use it.
      
      Fix this by always setting bd_fsfreeze_sb to NULL when
      bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
      Alternatively, we could set bd_fsfreeze_sb to whatever
      get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
      is successfully incremented to 1 from 0 (which can be achieved cleanly
      by moving the line currently setting bd_fsfreeze_sb to immediately
      after the "sync:" label, but it might be a little too subtle/easily
      overlooked in future).
      
      This fixes the currently panicking xfstests generic/085.
      
      Fixes: 040f04bd
      
       ("fs: simplify freeze_bdev/thaw_bdev")
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      04a6a536
  8. Jan 06, 2021