Skip to content
  1. Jan 09, 2018
    • Coly Li's avatar
      bcache: reduce cache_set devices iteration by devices_max_used · 2831231d
      Coly Li authored
      
      
      Member devices of struct cache_set is used to reference all attached
      bcache devices to this cache set. If it is treated as array of pointers,
      size of devices[] is indicated by member nr_uuids of struct cache_set.
      
      nr_uuids is calculated in drivers/md/super.c:bch_cache_set_alloc(),
      	bucket_bytes(c) / sizeof(struct uuid_entry)
      Bucket size is determined by user space tool "make-bcache", by default it
      is 1024 sectors (defined in bcache-tools/make-bcache.c:main()). So default
      nr_uuids value is 4096 from the above calculation.
      
      Every time when bcache code iterates bcache devices of a cache set, all
      the 4096 pointers are checked even only 1 bcache device is attached to the
      cache set, that's a wast of time and unncessary.
      
      This patch adds a member devices_max_used to struct cache_set. Its value
      is 1 + the maximum used index of devices[] in a cache set. When iterating
      all valid bcache devices of a cache set, use c->devices_max_used in
      for-loop may reduce a lot of useless checking.
      
      Personally, my motivation of this patch is not for performance, I use it
      in bcache debugging, which helps me to narrow down the scape to check
      valid bcached devices of a cache set.
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Reviewed-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2831231d
    • Zhai Zhaoxuan's avatar
      bcache: fix unmatched generic_end_io_acct() & generic_start_io_acct() · b40503ea
      Zhai Zhaoxuan authored
      
      
      The function cached_dev_make_request() and flash_dev_make_request() call
      generic_start_io_acct() with (struct bcache_device)->disk when they start a
      closure. Then the function bio_complete() calls generic_end_io_acct() with
      (struct search)->orig_bio->bi_disk when the closure has done.
      Since the `bi_disk` is not the bcache device, the generic_end_io_acct() is
      called with a wrong device queue.
      
      It causes the "inflight" (in struct hd_struct) counter keep increasing
      without decreasing.
      
      This patch fix the problem by calling generic_end_io_acct() with
      (struct bcache_device)->disk.
      
      Signed-off-by: default avatarZhai Zhaoxuan <kxuanobj@gmail.com>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Reviewed-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b40503ea
    • Kent Overstreet's avatar
      bcache: mark closure_sync() __sched · ce439bf7
      Kent Overstreet authored
      
      
      [edit by mlyle: include sched/debug.h to get __sched]
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ce439bf7
    • Kent Overstreet's avatar
      bcache: Fix, improve efficiency of closure_sync() · e4bf7919
      Kent Overstreet authored
      
      
      Eliminates cases where sync can race and fail to complete / get stuck.
      Removes many status flags and simplifies entering-and-exiting closure
      sleeping behaviors.
      
      [mlyle: fixed conflicts due to changed return behavior in mainline.
      extended commit comment, and squashed down two commits that were mostly
      contradictory to get to this state.  Changed __set_current_state to
      set_current_state per Jens review comment]
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e4bf7919
    • Michael Lyle's avatar
      bcache: allow quick writeback when backing idle · b1092c9a
      Michael Lyle authored
      
      
      If the control system would wait for at least half a second, and there's
      been no reqs hitting the backing disk for awhile: use an alternate mode
      where we have at most one contiguous set of writebacks in flight at a
      time. (But don't otherwise delay).  If front-end IO appears, it will
      still be quick, as it will only have to contend with one real operation
      in flight.  But otherwise, we'll be sending data to the backing disk as
      quickly as it can accept it (with one op at a time).
      
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Reviewed-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Acked-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b1092c9a
    • Michael Lyle's avatar
      bcache: writeback: properly order backing device IO · 6e6ccc67
      Michael Lyle authored
      
      
      Writeback keys are presently iterated and dispatched for writeback in
      order of the logical block address on the backing device.  Multiple may
      be, in parallel, read from the cache device and then written back
      (especially when there are contiguous I/O).
      
      However-- there was no guarantee with the existing code that the writes
      would be issued in LBA order, as the reads from the cache device are
      often re-ordered.  In turn, when writing back quickly, the backing disk
      often has to seek backwards-- this slows writeback and increases
      utilization.
      
      This patch introduces an ordering mechanism that guarantees that the
      original order of issue is maintained for the write portion of the I/O.
      Performance for writeback is significantly improved when there are
      multiple contiguous keys or high writeback rates.
      
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Reviewed-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Tested-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6e6ccc67
    • Tang Junhui's avatar
      bcache: fix wrong return value in bch_debug_init() · 539d39eb
      Tang Junhui authored
      
      
      in bch_debug_init(), ret is always 0, and the return value is useless,
      change it to return 0 if be success after calling debugfs_create_dir(),
      else return a non-zero value.
      
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      539d39eb
    • Tang Junhui's avatar
      bcache: segregate flash only volume write streams · 4eca1cb2
      Tang Junhui authored
      
      
      In such scenario that there are some flash only volumes
      , and some cached devices, when many tasks request these devices in
      writeback mode, the write IOs may fall to the same bucket as bellow:
      | cached data | flash data | cached data | cached data| flash data|
      then after writeback of these cached devices, the bucket would
      be like bellow bucket:
      | free | flash data | free | free | flash data |
      
      So, there are many free space in this bucket, but since data of flash
      only volumes still exists, so this bucket cannot be reclaimable,
      which would cause waste of bucket space.
      
      In this patch, we segregate flash only volume write streams from
      cached devices, so data from flash only volumes and cached devices
      can store in different buckets.
      
      Compare to v1 patch, this patch do not add a additionally open bucket
      list, and it is try best to segregate flash only volume write streams
      from cached devices, sectors of flash only volumes may still be mixed
      with dirty sectors of cached device, but the number is very small.
      
      [mlyle: fixed commit log formatting, permissions, line endings]
      
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4eca1cb2
    • Vasyl Gomonovych's avatar
      bcache: Use PTR_ERR_OR_ZERO() · 9d134117
      Vasyl Gomonovych authored
      
      
      Fix ptr_ret.cocci warnings:
      drivers/md/bcache/btree.c:1800:1-3: WARNING: PTR_ERR_OR_ZERO can be used
      
      Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
      
      Generated by: scripts/coccinelle/api/ptr_ret.cocci
      
      Signed-off-by: default avatarVasyl Gomonovych <gomonovych@gmail.com>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9d134117
    • Tang Junhui's avatar
      bcache: stop writeback thread after detaching · 8d29c442
      Tang Junhui authored
      
      
      Currently, when a cached device detaching from cache, writeback thread is
      not stopped, and writeback_rate_update work is not canceled. For example,
      after the following command:
      echo 1 >/sys/block/sdb/bcache/detach
      you can still see the writeback thread. Then you attach the device to the
      cache again, bcache will create another writeback thread, for example,
      after below command:
      echo  ba0fb5cd-658a-4533-9806-6ce166d883b9 > /sys/block/sdb/bcache/attach
      then you will see 2 writeback threads.
      This patch stops writeback thread and cancels writeback_rate_update work
      when cached device detaching from cache.
      
      Compare with patch v1, this v2 patch moves code down into the register
      lock for safety in case of any future changes as Coly and Mike suggested.
      
      [edit by mlyle: commit log spelling/formatting]
      
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8d29c442
    • Rui Hua's avatar
      bcache: ret IOERR when read meets metadata error · b221fc13
      Rui Hua authored
      
      
      The read request might meet error when searching the btree, but the error
      was not handled in cache_lookup(), and this kind of metadata failure will
      not go into cached_dev_read_error(), finally, the upper layer will receive
      bi_status=0.  In this patch we judge the metadata error by the return
      value of bch_btree_map_keys(), there are two potential paths give rise to
      the error:
      
      1. Because the btree is not totally cached in memery, we maybe get error
         when read btree node from cache device (see bch_btree_node_get()), the
         likely errno is -EIO, -ENOMEM
      
      2. When read miss happens, bch_btree_insert_check_key() will be called to
         insert a "replace_key" to btree(see cached_dev_cache_miss(), just for
         doing preparatory work before insert the missed data to cache device),
         a failure can also happen in this situation, the likely errno is
         -ENOMEM
      
      bch_btree_map_keys() will return MAP_DONE in normal scenario, but we will
      get either -EIO or -ENOMEM in above two cases. if this happened, we should
      NOT recover data from backing device (when cache device is dirty) because
      we don't know whether bkeys the read request covered are all clean.  And
      after that happened, s->iop.status is still its initially value(0) before
      we submit s->bio.bio, we set it to BLK_STS_IOERR, so it can go into
      cached_dev_read_error(), and finally it can be passed to upper layer, or
      recovered by reread from backing device.
      
      [edit by mlyle: patch formatting, word-wrap, comment spelling,
      commit log format]
      
      Signed-off-by: default avatarHua Rui <huarui.dev@gmail.com>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b221fc13
    • Jens Axboe's avatar
      Merge branch 'nvme-4.16' of git://git.infradead.org/nvme into for-4.16/block · 550203e6
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "Below are the pending nvme updates for Linux 4.16. Just fixes and
       cleanups from various contributors this time around."
      550203e6
  2. Jan 08, 2018
  3. Jan 07, 2018
    • Ming Lei's avatar
      blk-mq: fix race between updating nr_hw_queues and switching io sched · fb350e0a
      Ming Lei authored
      
      
      In both elevator_switch_mq() and blk_mq_update_nr_hw_queues(), sched tags
      can be allocated, and q->nr_hw_queue is used, and race is inevitable, for
      example: blk_mq_init_sched() may trigger use-after-free on hctx, which is
      freed in blk_mq_realloc_hw_ctxs() when nr_hw_queues is decreased.
      
      This patch fixes the race be holding q->sysfs_lock.
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fb350e0a
    • Ming Lei's avatar
      blk-mq: avoid to map CPU into stale hw queue · 7d4901a9
      Ming Lei authored
      
      
      blk_mq_pci_map_queues() may not map one CPU into any hw queue, but its
      previous map isn't cleared yet, and may point to one stale hw queue
      index.
      
      This patch fixes the following issue by clearing the mapping table before
      setting it up in blk_mq_pci_map_queues().
      
      This patches fixes this following issue reported by Zhang Yi:
      
      [  101.202734] BUG: unable to handle kernel NULL pointer dereference at 0000000094d3013f
      [  101.211487] IP: blk_mq_map_swqueue+0xbc/0x200
      [  101.216346] PGD 0 P4D 0
      [  101.219171] Oops: 0000 [#1] SMP
      [  101.222674] Modules linked in: sunrpc ipmi_ssif vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mxm_wmi intel_rapl_perf iTCO_wdt ipmi_si ipmi_devintf pcspkr iTCO_vendor_support sg dcdbas ipmi_msghandler wmi mei_me lpc_ich shpchp mei acpi_power_meter dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci crc32c_intel libata tg3 nvme nvme_core megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
      [  101.284881] CPU: 0 PID: 504 Comm: kworker/u25:5 Not tainted 4.15.0-rc2 #1
      [  101.292455] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
      [  101.301001] Workqueue: nvme-wq nvme_reset_work [nvme]
      [  101.306636] task: 00000000f2c53190 task.stack: 000000002da874f9
      [  101.313241] RIP: 0010:blk_mq_map_swqueue+0xbc/0x200
      [  101.318681] RSP: 0018:ffffc9000234fd70 EFLAGS: 00010282
      [  101.324511] RAX: ffff88047ffc9480 RBX: ffff88047e130850 RCX: 0000000000000000
      [  101.332471] RDX: ffffe8ffffd40580 RSI: ffff88047e509b40 RDI: ffff88046f37a008
      [  101.340432] RBP: 000000000000000b R08: ffff88046f37a008 R09: 0000000011f94280
      [  101.348392] R10: ffff88047ffd4d00 R11: 0000000000000000 R12: ffff88046f37a008
      [  101.356353] R13: ffff88047e130f38 R14: 000000000000000b R15: ffff88046f37a558
      [  101.364314] FS:  0000000000000000(0000) GS:ffff880277c00000(0000) knlGS:0000000000000000
      [  101.373342] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  101.379753] CR2: 0000000000000098 CR3: 000000047f409004 CR4: 00000000001606f0
      [  101.387714] Call Trace:
      [  101.390445]  blk_mq_update_nr_hw_queues+0xbf/0x130
      [  101.395791]  nvme_reset_work+0x6f4/0xc06 [nvme]
      [  101.400848]  ? pick_next_task_fair+0x290/0x5f0
      [  101.405807]  ? __switch_to+0x1f5/0x430
      [  101.409988]  ? put_prev_entity+0x2f/0xd0
      [  101.414365]  process_one_work+0x141/0x340
      [  101.418836]  worker_thread+0x47/0x3e0
      [  101.422921]  kthread+0xf5/0x130
      [  101.426424]  ? rescuer_thread+0x380/0x380
      [  101.430896]  ? kthread_associate_blkcg+0x90/0x90
      [  101.436048]  ret_from_fork+0x1f/0x30
      [  101.440034] Code: 48 83 3c ca 00 0f 84 2b 01 00 00 48 63 cd 48 8b 93 10 01 00 00 8b 0c 88 48 8b 83 20 01 00 00 4a 03 14 f5 60 04 af 81 48 8b 0c c8 <48> 8b 81 98 00 00 00 f0 4c 0f ab 30 8b 81 f8 00 00 00 89 42 44
      [  101.461116] RIP: blk_mq_map_swqueue+0xbc/0x200 RSP: ffffc9000234fd70
      [  101.468205] CR2: 0000000000000098
      [  101.471907] ---[ end trace 5fe710f98228a3ca ]---
      [  101.482489] Kernel panic - not syncing: Fatal exception
      [  101.488505] Kernel Offset: disabled
      [  101.497752] ---[ end Kernel panic - not syncing: Fatal exception
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7d4901a9
    • Ming Lei's avatar
      blk-mq: quiesce queue during switching io sched and updating nr_requests · 24f5a90f
      Ming Lei authored
      
      
      Dispatch may still be in-progress after queue is frozen, so we have to
      quiesce queue before switching IO scheduler and updating nr_requests.
      
      Also when switching io schedulers, blk_mq_run_hw_queue() may still be
      called somewhere(such as from nvme_reset_work()), and io scheduler's
      per-hctx data may not be setup yet, so cause oops even inside
      blk_mq_hctx_has_pending(), such as it can be run just between:
      
              ret = e->ops.mq.init_sched(q, e);
      AND
              ret = e->ops.mq.init_hctx(hctx, i)
      
      inside blk_mq_init_sched().
      
      This reverts commit 7a148c2f(block: don't call blk_mq_quiesce_queue()
      after queue is frozen) basically, and makes sure blk_mq_hctx_has_pending
      won't be called if queue is quiesced.
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Fixes: 7a148c2f(block: don't call blk_mq_quiesce_queue() after queue is frozen)
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      24f5a90f
    • Ming Lei's avatar
      blk-mq: quiesce queue before freeing queue · c2856ae2
      Ming Lei authored
      
      
      After queue is frozen, dispatch still may happen, for example:
      
      1) requests are submitted from several contexts
      2) requests from all these contexts are inserted to queue, but may dispatch
      to LLD in one of these paths, but other paths sill need to move on even all
      these requests are completed(that means blk_mq_freeze_queue_wait() returns
      at that time)
      3) dispatch after queue freezing still moves on and causes use-after-free,
      because request queue is freed
      
      This patch quiesces queue after it is frozen, and makes sure all
      in-progress dispatch are completed.
      
      This patch fixes the following kernel crash when running heavy IOs vs.
      deleting device:
      
      [   36.719251] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [   36.720318] IP: kyber_has_work+0x14/0x40
      [   36.720847] PGD 254bf5067 P4D 254bf5067 PUD 255e6a067 PMD 0
      [   36.721584] Oops: 0000 [#1] PREEMPT SMP
      [   36.722105] Dumping ftrace buffer:
      [   36.722570]    (ftrace buffer empty)
      [   36.723057] Modules linked in: scsi_debug ebtable_filter ebtables ip6table_filter ip6_tables tcm_loop iscsi_target_mod target_core_file target_core_iblock target_core_pscsi target_core_mod xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc fuse iptable_filter ip_tables sd_mod sg btrfs xor zstd_decompress zstd_compress xxhash raid6_pq mptsas mptscsih bcache crc32c_intel ahci mptbase libahci serio_raw scsi_transport_sas nvme libata shpchp lpc_ich virtio_scsi nvme_core binfmt_misc dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs
      [   36.733438] CPU: 2 PID: 2374 Comm: fio Not tainted 4.15.0-rc2.blk_mq_quiesce+ #714
      [   36.735143] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.9.3-1.fc25 04/01/2014
      [   36.736688] RIP: 0010:kyber_has_work+0x14/0x40
      [   36.737515] RSP: 0018:ffffc9000209bca0 EFLAGS: 00010202
      [   36.738431] RAX: 0000000000000008 RBX: ffff88025578bfc8 RCX: ffff880257bf4ed0
      [   36.739581] RDX: 0000000000000038 RSI: ffffffff81a98c6d RDI: ffff88025578bfc8
      [   36.740730] RBP: ffff880253cebfc8 R08: ffffc9000209bda0 R09: ffff8802554f3480
      [   36.741885] R10: ffffc9000209be60 R11: ffff880263f72538 R12: ffff88025573e9e8
      [   36.743036] R13: ffff88025578bfd0 R14: 0000000000000001 R15: 0000000000000000
      [   36.744189] FS:  00007f9b9bee67c0(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000
      [   36.746617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   36.748483] CR2: 0000000000000008 CR3: 0000000254bf4001 CR4: 00000000003606e0
      [   36.750164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   36.751455] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   36.752796] Call Trace:
      [   36.753992]  blk_mq_do_dispatch_sched+0x7f/0xe0
      [   36.755110]  blk_mq_sched_dispatch_requests+0x119/0x190
      [   36.756179]  __blk_mq_run_hw_queue+0x83/0x90
      [   36.757144]  __blk_mq_delay_run_hw_queue+0xaf/0x110
      [   36.758046]  blk_mq_run_hw_queue+0x24/0x70
      [   36.758845]  blk_mq_flush_plug_list+0x1e7/0x270
      [   36.759676]  blk_flush_plug_list+0xd6/0x240
      [   36.760463]  blk_finish_plug+0x27/0x40
      [   36.761195]  do_io_submit+0x19b/0x780
      [   36.761921]  ? entry_SYSCALL_64_fastpath+0x1a/0x7d
      [   36.762788]  entry_SYSCALL_64_fastpath+0x1a/0x7d
      [   36.763639] RIP: 0033:0x7f9b9699f697
      [   36.764352] RSP: 002b:00007ffc10f991b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000d1
      [   36.765773] RAX: ffffffffffffffda RBX: 00000000008f6f00 RCX: 00007f9b9699f697
      [   36.766965] RDX: 0000000000a5e6c0 RSI: 0000000000000001 RDI: 00007f9b8462a000
      [   36.768377] RBP: 0000000000000000 R08: 0000000000000001 R09: 00000000008f6420
      [   36.769649] R10: 00007f9b846e5000 R11: 0000000000000206 R12: 00007f9b795d6a70
      [   36.770807] R13: 00007f9b795e4140 R14: 00007f9b795e3fe0 R15: 0000000100000000
      [   36.771955] Code: 83 c7 10 e9 3f 68 d1 ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 97 b0 00 00 00 48 8d 42 08 48 83 c2 38 <48> 3b 00 74 06 b8 01 00 00 00 c3 48 3b 40 08 75 f4 48 83 c0 10
      [   36.775004] RIP: kyber_has_work+0x14/0x40 RSP: ffffc9000209bca0
      [   36.776012] CR2: 0000000000000008
      [   36.776690] ---[ end trace 4045cbce364ff2a4 ]---
      [   36.777527] Kernel panic - not syncing: Fatal exception
      [   36.778526] Dumping ftrace buffer:
      [   36.779313]    (ftrace buffer empty)
      [   36.780081] Kernel Offset: disabled
      [   36.780877] ---[ end Kernel panic - not syncing: Fatal exception
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c2856ae2
    • Jens Axboe's avatar
      mq-deadline: make it clear that __dd_dispatch_request() works on all hw queues · ca11f209
      Jens Axboe authored
      
      
      Don't pass in the hardware queue to __dd_dispatch_request(), since it
      leads the reader to believe that we are returning a request for that
      specific hardware queue. That's not how mq-deadline works, the state
      for determining which request to serve next is shared across all
      hardware queues for a device.
      
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ca11f209
    • Bart Van Assche's avatar
      target: Use sgl_alloc_order() and sgl_free() · 14db4917
      Bart Van Assche authored
      
      
      Use the sgl_alloc_order() and sgl_free() functions instead of open
      coding these functions.
      
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: default avatarNicholas A. Bellinger <nab@linux-iscsi.org>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      14db4917
    • Bart Van Assche's avatar
      nvmet/rdma: Use sgl_alloc() and sgl_free() · 68c6e9cd
      Bart Van Assche authored
      
      
      Use the sgl_alloc() and sgl_free() functions instead of open coding
      these functions.
      
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      68c6e9cd
    • Bart Van Assche's avatar
      nvmet/fc: Use sgl_alloc() and sgl_free() · 4442b56f
      Bart Van Assche authored
      
      
      Use the sgl_alloc() and sgl_free() functions instead of open coding
      these functions.
      
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarJames Smart <james.smart@broadcom.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4442b56f
    • Bart Van Assche's avatar
      crypto: scompress - use sgl_alloc() and sgl_free() · 8cd579d2
      Bart Van Assche authored
      
      
      Use the sgl_alloc() and sgl_free() functions instead of open coding
      these functions.
      
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8cd579d2
    • Bart Van Assche's avatar
      lib/scatterlist: Introduce sgl_alloc() and sgl_free() · e80a0af4
      Bart Van Assche authored
      
      
      Many kernel drivers contain code that allocates and frees both a
      scatterlist and the pages that populate that scatterlist.
      Introduce functions in lib/scatterlist.c that perform these tasks
      instead of duplicating this functionality in multiple drivers.
      Only include these functions in the build if CONFIG_SGL_ALLOC=y
      to avoid that the kernel size increases if this functionality is
      not used.
      
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e80a0af4
    • Wang Long's avatar
      writeback: update comment in inode_io_list_move_locked · bbbc3c1c
      Wang Long authored
      
      
      The @head can be wb->b_dirty_time, so update the comment.
      
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarWang Long <wanglong19@meituan.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bbbc3c1c