Skip to content
  1. Dec 19, 2018
  2. Dec 18, 2018
    • Ming Lei's avatar
      blk-mq: enable IO poll if .nr_queues of type poll > 0 · cd19181b
      Ming Lei authored
      
      
      The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
      is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
      before enabling IO poll.
      
      Otherwise IO race & timeout can be observed when running block/007.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cd19181b
    • Jens Axboe's avatar
      blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() · 3c94d83c
      Jens Axboe authored
      
      
      There's a single user of this function, dm, and dm just wants
      to check if IO is inflight, not that it's just allocated.
      
      This fixes a hang with srp/002 in blktests with dm, where it tries
      to suspend but waits for inflight IO to finish first. As it checks
      for just allocated requests, this fails.
      
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3c94d83c
    • Ming Lei's avatar
      blk-mq: skip zero-queue maps in blk_mq_map_swqueue · e5edd5f2
      Ming Lei authored
      
      
      From 7e849dd9 ("nvme-pci: don't share queue maps"), the mapping
      table won't be initialized actually if map->nr_queues is zero, so
      we can't use blk_mq_map_queue_type() to retrieve hctx any more.
      
      This way still may cause broken mapping, fix it by skipping zero-queues
      maps in blk_mq_map_swqueue().
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e5edd5f2
    • Dennis Zhou's avatar
      block: fix blk-iolatency accounting underflow · 13369816
      Dennis Zhou authored
      The blk-iolatency controller measures the time from rq_qos_throttle() to
      rq_qos_done_bio() and attributes this time to the first bio that needs
      to create the request. This means if a bio is plug-mergeable or
      bio-mergeable, it gets to bypass the blk-iolatency controller.
      
      The recent series [1], to tag all bios w/ blkgs undermined how iolatency
      was determining which bios it was charging and should process in
      rq_qos_done_bio(). Because all bios are being tagged, this caused the
      atomic_t for the struct rq_wait inflight count to underflow and result
      in a stall.
      
      This patch adds a new flag BIO_TRACKED to let controllers know that a
      bio is going through the rq_qos path. blk-iolatency now checks if this
      flag is set to see if it should process the bio in rq_qos_done_bio().
      
      Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing.
      BIO_THROTTLED was another candidate, but the flag is set for all bios
      that have gone through blk-throttle code. Overloading a flag comes with
      the burden of making sure that when either implementation changes, a
      change in setting rules for one doesn't cause a bug in the other. So
      here, we unfortunately opt for adding a new flag.
      
      [1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/
      
      
      
      Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      13369816
    • Ming Lei's avatar
      blk-mq: fix dispatch from sw queue · c16d6b5a
      Ming Lei authored
      
      
      When a request is added to rq list of sw queue(ctx), the rq may be from
      a different type of hctx, especially after multi queue mapping is
      introduced.
      
      So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
      blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
      hctx can be dispatched to current hctx in case that read queue or poll
      queue is enabled.
      
      This patch fixes this issue by introducing per-queue-type list.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      
      Changed by me to not use separately cacheline aligned lists, just
      place them all in the same cacheline where we had just the one list
      and lock before.
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c16d6b5a
    • Damien Le Moal's avatar
      block: mq-deadline: Fix write completion handling · 7211aef8
      Damien Le Moal authored
      
      
      For a zoned block device using mq-deadline, if a write request for a
      zone is received while another write was already dispatched for the same
      zone, dd_dispatch_request() will return NULL and the newly inserted
      write request is kept in the scheduler queue waiting for the ongoing
      zone write to complete. With this behavior, when no other request has
      been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
      and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
      __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
      queue when the already dispatched write request completes. The newly
      dispatched request stays stuck in the scheduler queue until eventually
      another request is submitted.
      
      This problem does not affect SCSI disk as the SCSI stack handles queue
      restart on request completion. However, this problem is can be triggered
      the nullblk driver with zoned mode enabled.
      
      Fix this by always requesting a queue restart in dd_dispatch_request()
      if no request was dispatched while WRITE requests are queued.
      
      Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      
      Add missing export of blk_mq_sched_restart()
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7211aef8
  3. Dec 17, 2018
  4. Dec 16, 2018
  5. Dec 14, 2018
  6. Dec 13, 2018
    • Guoju Fang's avatar
      bcache: print number of keys in trace_bcache_journal_write · e78bd0d2
      Guoju Fang authored
      
      
      Sometimes flush journal may be very frequent, so it's useful to dump
      number of keys every time write journal.
      
      Signed-off-by: default avatarGuoju Fang <fangguoju@gmail.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e78bd0d2
    • Coly Li's avatar
      bcache: set writeback_percent in a flexible range · cc38ca7e
      Coly Li authored
      
      
      Because CUTOFF_WRITEBACK is defined as 40, so before the changes of
      dynamic cutoff writeback values, writeback_percent is limited to [0,
      CUTOFF_WRITEBACK]. Any value larger than CUTOFF_WRITEBACK will be fixed
      up to 40.
      
      Now cutof writeback limit is a dynamic value bch_cutoff_writeback, so
      the range of writeback_percent can be a more flexible range as [0,
      bch_cutoff_writeback]. The flexibility is, it can be expended to a
      larger or smaller range than [0, 40], depends on how value
      bch_cutoff_writeback is specified.
      
      The default value is still strongly recommended to most of users for
      most of workloads. But for people who want to do research on bcache
      writeback perforamnce tuning, they may have chance to specify more
      flexible writeback_percent in range [0, 70].
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cc38ca7e
    • Coly Li's avatar
      bcache: make cutoff_writeback and cutoff_writeback_sync tunable · 9aaf5165
      Coly Li authored
      
      
      Currently the cutoff writeback and cutoff writeback sync thresholds are
      defined by CUTOFF_WRITEBACK (40) and CUTOFF_WRITEBACK_SYNC (70) as
      static values. Most of time these they work fine, but when people want
      to do research on bcache writeback mode performance tuning, there is no
      chance to modify the soft and hard cutoff writeback values.
      
      This patch introduces two module parameters bch_cutoff_writeback_sync
      and bch_cutoff_writeback which permit people to tune the values when
      loading bcache.ko. If they are not specified by module loading, current
      values CUTOFF_WRITEBACK_SYNC and CUTOFF_WRITEBACK will be used as
      default and nothing changes.
      
      When people want to tune this two values,
      - cutoff_writeback can be set in range [1, 70]
      - cutoff_writeback_sync can be set in range [1, 90]
      - cutoff_writeback always <= cutoff_writeback_sync
      
      The default values are strongly recommended to most of users for most of
      workloads. Anyway, if people wants to take their own risk to do research
      on new writeback cutoff tuning for their own workload, now they can make
      it.
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9aaf5165