Skip to content
  1. Sep 28, 2018
  2. Sep 27, 2018
    • Guoju Fang's avatar
      bcache: add separate workqueue for journal_write to avoid deadlock · 0f843e65
      Guoju Fang authored
      
      
      After write SSD completed, bcache schedules journal_write work to
      system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM
      flag. system_wq is also a bound wq, and there may be no idle kworker on
      current processor. Creating a new kworker may unfortunately need to
      reclaim memory first, by shrinking cache and slab used by vfs, which
      depends on bcache device. That's a deadlock.
      
      This patch create a new workqueue for journal_write with WQ_MEM_RECLAIM
      flag. It's rescuer thread will work to avoid the deadlock.
      
      Signed-off-by: default avatarGuoju Fang <fangguoju@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0f843e65
    • Boris Ostrovsky's avatar
      xen/blkfront: When purging persistent grants, keep them in the buffer · f151ba98
      Boris Ostrovsky authored
      Commit a46b5367 ("xen/blkfront: cleanup stale persistent grants")
      added support for purging persistent grants when they are not in use. As
      part of the purge, the grants were removed from the grant buffer, This
      eventually causes the buffer to become empty, with BUG_ON triggered in
      get_free_grant(). This can be observed even on an idle system, within
      20-30 minutes.
      
      We should keep the grants in the buffer when purging, and only free the
      grant ref.
      
      Fixes: a46b5367
      
       ("xen/blkfront: cleanup stale persistent grants")
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f151ba98
    • Damien Le Moal's avatar
      block: fix deadline elevator drain for zoned block devices · 854f31cc
      Damien Le Moal authored
      
      
      When the deadline scheduler is used with a zoned block device, writes
      to a zone will be dispatched one at a time. This causes the warning
      message:
      
      deadline: forced dispatching is broken (nr_sorted=X), please report this
      
      to be displayed when switching to another elevator with the legacy I/O
      path while write requests to a zone are being retained in the scheduler
      queue.
      
      Prevent this message from being displayed when executing
      elv_drain_elevator() for a zoned block device. __blk_drain_queue() will
      loop until all writes are dispatched and completed, resulting in the
      desired elevator queue drain without extensive modifications to the
      deadline code itself to handle forced-dispatch calls.
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Fixes: 8dc8146f
      
       ("deadline-iosched: Introduce zone locking support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      854f31cc
  3. Sep 26, 2018
  4. Sep 22, 2018
    • Omar Sandoval's avatar
      block: use nanosecond resolution for iostat · b57e99b4
      Omar Sandoval authored
      Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
      updating properly on 4.18. This is because we started using ktime to
      track elapsed time, and we convert nanoseconds to jiffies when we update
      the partition counter. However, this gets rounded down, so any I/Os that
      take less than a jiffy are not accounted for. Previously in this case,
      the value of jiffies would sometimes increment while we were doing I/O,
      so at least some I/Os were accounted for.
      
      Let's convert the stats to use nanoseconds internally. We still report
      milliseconds as before, now more accurately than ever. The value is
      still truncated to 32 bits for backwards compatibility.
      
      Fixes: 522a7775
      
       ("block: consolidate struct request timestamp fields")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarKlaus Kusche <klaus.kusche@computerix.info>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b57e99b4
  5. Sep 20, 2018
    • Jens Axboe's avatar
      Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus · d611aaf3
      Jens Axboe authored
      Pull NVMe fix from Christoph.
      
      * 'nvme-4.19' of git://git.infradead.org/nvme:
        nvme: count all ANA groups for ANA Log page
      d611aaf3
    • Andy Whitcroft's avatar
      floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl · 65eea8ed
      Andy Whitcroft authored
      
      
      The final field of a floppy_struct is the field "name", which is a pointer
      to a string in kernel memory.  The kernel pointer should not be copied to
      user memory.  The FDGETPRM ioctl copies a floppy_struct to user memory,
      including this "name" field.  This pointer cannot be used by the user
      and it will leak a kernel address to user-space, which will reveal the
      location of kernel code and data and undermine KASLR protection.
      
      Model this code after the compat ioctl which copies the returned data
      to a previously cleared temporary structure on the stack (excluding the
      name pointer) and copy out to userspace from there.  As we already have
      an inparam union with an appropriate member and that memory is already
      cleared even for read only calls make use of that as a temporary store.
      
      Based on an initial patch by Brian Belleville.
      
      CVE-2018-7755
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      
      Broke up long line.
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      65eea8ed
    • Jens Axboe's avatar
      libata: mask swap internal and hardware tag · 7ce5c8cd
      Jens Axboe authored
      hen we're comparing the hardware completion mask passed in from the
      driver with the internal tag pending mask, we need to account for the
      fact that the internal tag is different from the hardware tag. If not,
      then we can end up either prematurely completing the internal tag (since
      it's not set in the hw mask), or simply flag an error:
      
      ata2: illegal qc_active transition (100000000->00000001)
      
      If the internal tag is set, then swap that with the hardware tag in this
      case before comparing with what the hardware reports.
      
      Fixes: 28361c40
      
       ("libata: add extra internal command")
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201151
      Cc: stable@vger.kernel.org
      Reported-by: default avatarPaul Sbarra <sbarra.paul@gmail.com>
      Tested-by: default avatarPaul Sbarra <sbarra.paul@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7ce5c8cd
  6. Sep 17, 2018
  7. Sep 13, 2018
    • Jens Axboe's avatar
      null_blk: fix zoned support for non-rq based operation · b228ba1c
      Jens Axboe authored
      The supported added for zones in null_blk seem to assume that only rq
      based operation is possible. But this depends on the queue_mode setting,
      if this is set to 0, then cmd->bio is what we need to be operating on.
      Right now any attempt to load null_blk with queue_mode=0 will
      insta-crash, since cmd->rq is NULL and null_handle_cmd() assumes it to
      always be set.
      
      Make the zoned code deal with bio's instead, or pass in the
      appropriate sector/nr_sectors instead.
      
      Fixes: ca4b2a01
      
       ("null_blk: add zone support")
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b228ba1c
  8. Sep 12, 2018
  9. Sep 10, 2018
  10. Sep 07, 2018
  11. Sep 06, 2018
  12. Sep 05, 2018
  13. Sep 01, 2018
    • Dennis Zhou (Facebook)'s avatar
      blkcg: use tryget logic when associating a blkg with a bio · 31118850
      Dennis Zhou (Facebook) authored
      There is a very small change a bio gets caught up in a really
      unfortunate race between a task migration, cgroup exiting, and itself
      trying to associate with a blkg. This is due to css offlining being
      performed after the css->refcnt is killed which triggers removal of
      blkgs that reach their blkg->refcnt of 0.
      
      To avoid this, association with a blkg should use tryget and fallback to
      using the root_blkg.
      
      Fixes: 08e18eab
      
       ("block: add bi_blkg to the bio for cgroups")
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      31118850
    • Dennis Zhou (Facebook)'s avatar
      blkcg: delay blkg destruction until after writeback has finished · 59b57717
      Dennis Zhou (Facebook) authored
      Currently, blkcg destruction relies on a sequence of events:
        1. Destruction starts. blkcg_css_offline() is called and blkgs
           release their reference to the blkcg. This immediately destroys
           the cgwbs (writeback).
        2. With blkgs giving up their reference, the blkcg ref count should
           become zero and eventually call blkcg_css_free() which finally
           frees the blkcg.
      
      Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
      and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
      on the completion of all writeback associated with the blkcg. A count of
      the number of cgwbs is maintained and once that goes to zero, blkg
      destruction can follow. This should prevent premature blkg destruction
      related to writeback.
      
      The new process for blkcg cleanup is as follows:
        1. Destruction starts. blkcg_css_offline() is called which offlines
           writeback. Blkg destruction is delayed on the cgwb_refcnt count to
           avoid punting potentially large amounts of outstanding writeback
           to root while maintaining any ongoing policies. Here, the base
           cgwb_refcnt is put back.
        2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
           and handles destruction of blkgs. This is where the css reference
           held by each blkg is released.
        3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
           This finally frees the blkg.
      
      It seems in the past blk-throttle didn't do the most understandable
      things with taking data from a blkg while associating with current. So,
      the simplification and unification of what blk-throttle is doing caused
      this.
      
      Fixes: 08e18eab
      
       ("block: add bi_blkg to the bio for cgroups")
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      59b57717
    • Dennis Zhou (Facebook)'s avatar
      Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" · 6b065462
      Dennis Zhou (Facebook) authored
      This reverts commit 4c699480.
      
      Destroying blkgs is tricky because of the nature of the relationship. A
      blkg should go away when either a blkcg or a request_queue goes away.
      However, blkg's pin the blkcg to ensure they remain valid. To break this
      cycle, when a blkcg is offlined, blkgs put back their css ref. This
      eventually lets css_free() get called which frees the blkcg.
      
      The above commit (4c699480) breaks this order of events by trying to
      destroy blkgs in css_free(). As the blkgs still hold references to the
      blkcg, css_free() is never called.
      
      The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
      addressed in the following patch by delaying destruction of a blkg until
      all writeback associated with the blkcg has been finished.
      
      Fixes: 4c699480
      
       ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6b065462
  14. Aug 30, 2018
  15. Aug 29, 2018
  16. Aug 28, 2018
  17. Aug 26, 2018
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180825' of git://git.kernel.dk/linux-block · b8dcdab3
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few small fixes for this merge window:
      
         - Locking imbalance fix for bcache (Shan Hai)
      
         - A few small fixes for wbt. One is a cleanup/prep, one is a fix for
           an existing issue, and the last two are fixes for changes that went
           into this merge window (me)"
      
      * tag 'for-linus-20180825' of git://git.kernel.dk/linux-block:
        blk-wbt: don't maintain inflight counts if disabled
        blk-wbt: fix has-sleeper queueing check
        blk-wbt: use wq_has_sleeper() for wq active check
        blk-wbt: move disable check into get_limit()
        bcache: release dc->writeback_lock properly in bch_writeback_thread()
      b8dcdab3